AO-A107  774 
unclassified 


6E0RGIA  INST  OF  TECH  ATLANTA  SCHOOL  OF  INFORMATION  A— ETC  F/G  9/2 
PAPERS  ON  PROGRAM  TESTING* CU> 

1979  R  A  DEMILLO*  R  J  LIPTON*  F  G  SAYMARD  N00014-79-C-0231 
GIT-ICS-79/04  _ NL 


AD  Al  07  77 


PAPERS  ON  PROGRAM  TESTING 


Richard  A,  DeMiluo* 
Richard  J.  Lipton** 
Frederick  G.  Savward*** 


Summer,  1979 


*  Georgia  Institute  of  Technology 
**  University  of  California,  Berkeley 
***  Yale  University 


document  has  beer.  crr-pz  .: 
far  public  release  and  sc.;-  ,  ns, 

^•Wbution  la  unlimited. 


a. 

o 

CJ> 


DISCLAIMER  NOTICE 


THIS  DOCUMENT  IS  BEST  QUALITY 
PRACTICABLE.  THE  COPY  FURNISHED 
TO  DTIC  CONTAINED  A  SIGNIFICANT 
NUMBER  OF  PAGES  WHICH  DO  NOT 
REPRODUCE  LEGIBLY. 


^By  the  Program  Mutation  Groups  at  Georgia  Tech,  the  University  of  California, 
Berkeley,  and  Yale  University.  The  research  reported  herein  was  supported  in 
part  by  AIRMICS  through  ARO  grant  no.  DAAG29-78-G-0121  and  by  ONR  grant  no. 
N00014-79-C-0231. 


Forward 


Since  late  1976,  we  have  been  involved  in  what  we  believe  is  a  new 
approach  to  computer  program  testing,  an  approach  called  mutation  analysis 
(and  we  shall  forever  be  indebted  to  Jerome  Feldman  for  suggesting  the 
term).  The  main  novelties  of  the  mutation  approach  to  program  testing  are 
its  simplicity,  its  empirical  basis,  its  ease  of  mechanical  implementation, 
and  its  tractability  for  scientific  analysis.  Although  much  remains  to  be 
learned  about  mutation  as  a  testing  tool,  there  is  a  considerable  body  of 
written  material  which  describes  our  initial  experience  with  the  technique. 

Much  of  this  material  has  appeared  only  in  workshops  or  as  memoranda, 
so  we  have  been  urged  to  collect  it  together  for  wider  dissemination.  The 
current  collection  is  the  result.  The  reader  should  note  that  the 
selections  do  not  appear  in  chronological  order;  rather,  they  are  organized 
so  that  a  sufficiently  patient  reader  may  proceed  from  the  conceptual  basis 
of  mutation  analysis  through  implementation,  application,  and  theoretical 
issues. 

We  expect  to  distill  much  of  this  material  into  a  more  formal  treat¬ 
ment  in  the  coming  months;  as  always,  conments  and  criticisms  of  all  kinds 
will  be  appreciated. 


Richard  A.  DeMillo 

School  of  Information  and  Computer  Science 
Georgia  Institute  of  Technology 


Richard  J.  Lipton 

Department  of  Electrical  Engineering  and 
Computer  Science 

University  of  California,  Berkeley 


Frederick  G.  Sayward 
Department  of  Computer  Science 
Yale  University 


Summer,  1979 


TABLE  OF  CONTENTS 


PART  ONE:  INTRODUCTION  TO  MUTATION  Page 

1.  Hints  on  Test  Data  Selection:  Help  for  the 

Practicing  Programmer  .  1 

2.  Program  Mutation:  A  New  Approach  to  Program  Testing .  9 

3.  Mutation  Analysis  .  29 

4.  Discussion  of  "A  Survey  of  Programming  Testing  Issues" .  58 

5.  The  Status  of  Research  on  Program  Mutation .  66 


PART  TWO:  APPLICATION  ISSUES 

1.  Program  Mutation  as  a  Tool  for  Managing 


Large-Scale  Software  Development .  85 

2.  Stability  of  Test  Data  from  Program  Mutation .  91 

PART  THREE:  THEORETICAL  ISSUES 

1.  A  Probabilistic  Remark  on  Algebraic  Program  Testing  .  100 

2.  Mutation  Analysis  of  Decision  Table  Programs .  103 

3.  Proving  LISP  Programs  Using  Test  Data .  107 


PART  FOUR:  IMPLEMENTATION  ISSUES 

1.  The  Design  of  a  Prototype  Mutation  System  for 


Program  Testing  .  137 

2.  Heuristics  for  Determining  Equivalence  of 

Program  Mutations  .  142 

3.  CPMS  Users  Guide .  171 

4.  Pilot  Mutation  System  (PIMS)  Users'  Manual .  181 


1 


Hints  on  Test  Data  Selection: 

Help  for  the  Practicing  Programmer 

Richard  A.  DeMillo  Richard  3.  Lipton  and  Frederick  G.  Sayward 

Georgia  Institute  of  Technology  Yale  University 


In  many  cases  tests  ot  a  program  that  uncover  simple 
errors  are  also  effective  in  uncovering  much  more  complex 
errors.  This  so-called  coupling  effect  can  be  used  to  save 
work  during  the  testing  process. 


t 


Much  of  the  technical  literature  in  software 
reliability  deals  with  tentative  methodologies  and 
underdeveloped  techniques;  hence  it  is  not  surpris¬ 
ing  that  the  programming  staff  responsible  for  debug¬ 
ging  a  large  piece  of  software  often  feels  ignored. 
It  is  an  economic  and  political  requirement  in  most 
production  programming  shops  that  programmers 
shall  spend  as  little  time  as  possible  in  testing.  The 
programmer  must  therefore  be  content  to  test 
cleverly  but  cheaply;  state-of-the-art  methodologies 
always  seem  to  be  just  beyond  what  can  be  afford¬ 
ed.  We  intend  to  convince  the  reader  that  much 
can  be  accomplished  even  under  these  constraints. 

From  the  point  of  view  of  management,  there  is 
some  justification  for  opposing  a  long-term  view  of 
the  testing  phase  of  the  development  cycle.  Figure  1 
shows  the  relative  effect  of  testing  on  the  remain¬ 
ing  system  bugs  for  several  medium-scale  systems 
developed  by  System  Development  Corporation.' 
Notice  that  in  the  last  half  of  the  test  cycle,  the 
average  change  in  the  known-error  status  of  a 
system  is  0.4  percent  per  unit  of  testing  effort, 
while  in  the  first  half  of  the  cycle,  1.54  percent  of 
the  errors  are  discovered  per  unit  of  testing  effort, 
Since  it  is  enormously  difficult  to  be  convincing  in 
stating  that  the  testing  effort  is  complete,  the 
apparently  rapidly  decreasing  return  per  unit  of 
effort  invested  becomes  a  dominating  concern.  The 
standard  solution,  of  course,  is  to  limit  the  amount 
of  testing  time  to  the  most  favorable  part  of  the 
cycle. 


Programmers  have  one  great  advantage 
that  is  almost  never  exploited:  they 
create  programs  that  are  close  to  being 
correct! 


How,  then,  should  programmers  cope?  Their 
more  sophisticated  general  methodologies  are  not 
likely  to  be  applicable."  In  addition,  they  have  the 
burden  of  convincing  managers  that  their  software 
is  indeed  reliable. 


The  coupling  effect 

Programmers,  however,  have  one  great  advantage 
that  is  almost  never  really  exploited:  they  create 
programs  that  are  close  to  being  correct!  Program¬ 
mers  do  not  create  programs  at  random;  competent 
programmers,  in  their  many  iterations  through  the 
design  process,  are  constantly  whittling  away  the 
distance  between  what  their  programs  look  like 
now  and  what  they  are  intended  to  look  like.  Pro¬ 
grammers  also  have  at  their  disposal 

•  a  rough  idea  of  the  kinds  of  errors  most  likely 
to  occur; 

•  the  ability  and  opportunity  to  examine  their 
programs  in  detail. 

Error  classifications.  In  attempting  to  formulate 
a  comprehensive  theory  of  test  data  selection.  Susan 
Gerhart  and  John  Goodenough*  have  suggested 
that  errors  be  classified  as  follows: 

(1)  failure  to  satisfy  specifications  due  to  imple¬ 
mentation  error; 

( 2 '  failure  to  write  specifications  that  correctly 
represent  a  design: 

(3i  failure  to  understand  a  requirement; 

(4;  failure  to  satisfy  a  requirement. 

But  these  are  global  concerns.  Errors  are  always 
reflected  in  programs  as 

•  missing  control  paths. 

•  inappropriate  path  selection,  or 

•  inappropriate  or  missing  actions. 


34 


00 18-9 162/78/0400-0034*00  7 r>  1978  IEEE 


COMPUTER 


2 


We  do  not  explicitly  address  classifications  (21 
and  (31  in  this  article,  except  to  point  out  that  even 
here  a  programmer  can  do  much  without  fancy 
theories.  If  we  are  right  in  our  perception  of  pro¬ 
grams  as  being  close  to  correct,  then  these  errors 
should  be  detectable  as  small  deviations  from  the 
intended  program.  There  is  an  amazing  lack  of 
published  data  on  this  subject,  but  we  do  have 
some  idea  of  the  most  common  errors.  E.  A.  Youngs, 
in  his  PhD  dissertation,'  analyzed  1258  errors  in 
Fortran.  Cobol.  PL/I,  and  Basic  programs.  The 
errors  were  distributed  as  shown  in  Table  1. 

In  addition  to  these  errors,  certain  other  errors 
were  present  in  .negligible  quantities.  There  were, 
for  instance,  operating  system  interface  errors, 
such  as  incorrect  job  identification  and  erroneous 
external  I/O  assignment.  Also  present  were  errors 
in  comments,  pseudo-ops,  and  no-ops  which  for 
various  reasons  created  detectable  error  conditions. 

Complex  errors  coupled.  How,  then,  do  the  rela¬ 
tively  simple  error  types  discovered  by  Youngs 
connect  with  the  Gerhart-Goodenough  error  classi¬ 
fication?  Well,  the  naive  answer  is  that  since  arbi¬ 
trarily  pernicious  errors  may  be  responsible  for  a 
given  failure,  it  must  be  that  simple  errors  com¬ 
pound  in  more  massive  error  conditions.  For  the 
practical  treatment  of  test  data,  the  Youngs  error 
statistics,  therefore,  do  not  seem  to  help  much  at 
all.  Fortunately  though,  the  observation  that  pro¬ 
grams  are  “close  to  correct"  leads  us  to  an  assump¬ 
tion  which  makes  the  high  frequency  of  simple 
errors  very  important: 

Tke  coupling  effect  Test  data  that  distinguishes 

all  programs  differing  from  a  correct  one  by  only 

simple  errors  is  so  sensitive  that  it  also  implic¬ 
itly  distinguishes  more  complex  errors. 

In  other  words,  complex  errors  are  coupled  to 
simple  errors.  There  is,  of  course,  no  hope  of  “prov¬ 
ing"  the  coupling  effect;  it  is  an  empirical  principle. 
If  the  coupling  effect  can  be  observed  in  "real-world" 
programs,  then  it  has  dramatic  implications  for 
testing  strategies  in  general  and  domain-specific, 
limited  testing  in  particular.  Rather  than  scamper 
after  errors  of  undetermined  character,  the  tester 
should  attempt  a  systematic  search  for  simple 
errors  that  will  also  uncover  deeper  errors  via  the 
coupling  effect. 

Path  analysis.  This  point  seems  so  obvious  that 
it's  not  worth  making;  test  to  uncover  errors.  Yet 
it's  a  point  that's  often  lost  in  the  shuffle.  In  a 
common  methodology  known  as  path  analysis,  the 
point  of  the  test  data  is  to  drive  a  program  through 
all  of  its  control  paths.  It  is  certainly  hard  to  criti¬ 
cize  such  a  goal,  since  a  thoroughly  tested  program 
must  have  been  exercised  in  this  way.  But  unless 
one  recognizes  that  the  test  data  should  also  dis¬ 
tinguish  errors,  he  might  be  tempted  to  conclude, 
for  example,  that  the  program  segment  diagrammed 
in  Figure  2  can  be  tested  by  exercising  paths  1-2 
and  1-3,  even  though  one  of  the  clauses  P  and  Q 


may  not  have  been  affected  at  all!  In  general,  the 
relative  ordering  of  P  and  Q  may  be  irrelevant  or 
partially  unknown  and  side  effects  may  occur,  so 
that  actually  the  eight  paths  shown  in  Figure  3  are 
required  to  ensure  that  the  statement  has  been 
adequately  tested. 


PERCENT  V  TESTING  EFFORT 
(MAN  MONTHS  COMPUTER  HOURS  ETC  > 

Figure  1 .  More  programming  errors  are  found  In  the  early  part  ol  the 
test  cycle  then  in  the  final  part. 


Table  1.  Frequency  of  occurrence  of  1258  errors 
In  Fortran,  Cobol,  PUI.  and  Basic  programs. 


Error  Type 

Relative 
Frequency 
ol  Occurrence 

Error  in  assignment  or  compulation 

?7 

Allocation  error 

15 

Other,  unknown,  or  multiple  errors 

11 

Unsuccessful  iteration 

09 

Other  I/O  error 

07 

I/O  formatting  error 

06 

Error  in  prahchmg 

unconditional 

Ot 

conditional 

05 

Parameter  or  subscript  violation 

05 

Subprogram  invocation  error 

05 

Misplaced  delimiter 

OH 

Oata  error 

0? 

Error  in  location  or  marker 

0? 

Nonterminating  subprogram 

01 

April  1978 


35 


Two  examples  given  below  indicate  that  test 
data  derived  to  uncover  simple  errors  can,  in  fact, 
be  vastly  superior  to,  say,  randomly  chosen  data  or 
data  generated  for  path  analysis.  A  byproduct  of 
the  discussion  will  be  some  evidence  for  the  coupling 
effect.  A  third  example  reveals  another  advantage 
of  selecting  test  data  with  an  eye  on  coupling: 
since  it's  a  problem-specific  activity,  there  are 
enhanced  possibilities  for  discovering  useful  heu¬ 
ristics  for  test  data  selection.  This  example  will 
lead  to  useful  advice  for  generating  test  vectors  for 
programs  that  manipulate  arrays. 

Our  groups  at  Yale  University  and  the  Georgia 
Institute  of  Technology  have  constructed  a  system 
whereby  we  can  determine  the  extent  to  which  a 
given  set  of  test  data  has  adequately  tested  a 
Fortran  program  by  direct  measurement  of  the 
number  and  kinds  of  errors  it  is  capable  of  uncover¬ 
ing.  This  method,  known  as  program  mutation,  is 
used  interactively:  A  programmer  enters  from  a 
terminal  a  program,  P,  and  a  proposed  test  data 
set  whose  adequacy  is  to  be  determined.  The  muta¬ 
tion  system  first  executes  the  program  on  the  test 
data:  if  the  program  gives  incorrect  answers  then 
certainly  the  program  is  in  error.  On  the  other 
hand,  if  the  program  gives  correct  answers,  then  it 
may  be  that  the  program  is  still  in  error,  but  the 
test  data  is  not  sensitive  enough  to  distinguish 
that  error:  it  is  not  adequate.  The  mutation  system 
then  creates  a  number  of  mutations  of  P  that  differ 
from  P  only  in  the  occurrence  of  simple  errors  (for 
instance,  where  P  contains  the  expression  “B.LE.C" 
a  mutation  will  contain  “B.EQ.C").  Let  us  call 
these  mutations  P,.  P,.  ,  P,. 

Now,  for  the  given  set  of  test  data  there  are  only 
two  possibilities: 

(1)  on  that  data  P  gives  different  results  from 

the  P.  mutations,  or 


(2)  on  that  data  P  gives  the  same  results  as 
some  P, 

In  case  (II  P,  is  said  to  be  dead:  the  "error"  that 
produced  P.  from  P  was  indeed  distinguished  by 
the  test  data.  In  case  (2),  the  mutant  P,  is  said  to 
be  live:  a  mutant  may  be  live  for  two  reasons: 

(1)  the  test  data  does  not  contain  enough  sensi¬ 
tivity  to  distinguish  the  error  that  gave  rise  to 
P„  or 

(2)  P.  and  P  are  actually  equivalent  programs 
and  no  test  data  will  distinguish  them  (i.e.,  the 
"error”  that  gave  rise  to  P.  was  not  an  error  at 
all). 

Test  data  that  leaves  no  live  mutants  or  only  live 
mutants  that  are  equivalent  to  P  is  adequate  in  the 
following  sense:  Either  the  program  P  is  correct  or 
there  is  an  unexpected  error  in  P,  which— by  the 
coupling  effect— we  expect  to  happen  seldom  if  the 
errors  used  to  create  the  mutants  are  carefully 
chosen. 

Now,  it  is  not  completely  apparent  that  this 
process  is  computationally  feasible.  But,  as  we 
describe  in  more  detail  elsewhere,  there  is  a  very 
good  choice  of  methodology  for  generating  muta¬ 
tions  to  bring  the  procedure  within  attractive 
economic  bounds.' 

Apparently,  the  information  returned  by  the 
mutation  system  can  be  effectively  utilized  by  the 
programmer.  The  programmer  looks  at  a  negative 
response  from  the  system  as  a  “hard  question" 
concerning  his  program  (e.g.,  "The  test  data  you’ve 
given  me  says  it  doesn’t  matter  whether  or  not  this 
test  is  for  equality  or  inequality;  why  is  that?") 
and  is  able  to  use  his  answers  to  the  question  as  a 
guide  in  generating  more  sensitive  test  data. 


f  ? 


0  ©0  CD©  ©0  © 


Figure  3.  Eight  paths  may  ba  required  for  an  adequate  test. 


36 


COMPUTER 


4 


A  simple  example 

Our  first  example  is  very  simple:  it  involves  the 
max  algorithm  used  for  other  purposes  by  Peter 
Naur  in  the  early  1960's.  The  task  is  to  set  a  vari¬ 
able  R  to  the  index  of  the  first  occurrence  of  a 
maximum  element  in  the  vector  A(l),  .  AIN). 

For  example,  the  following  Fortran  subroutine 
might  be  offered  as  an  implementation  of  such  an 
algorithm: 

SUBROUTINE  MAX  (A.N.Rl 
INTEGER  AINU.N.R 

1  R  =  1 

2  DO  3  1=2. N.l 

3  IF  IA(ll.GT,'A(R))R=I 
RETURN 

END 

We  will  choose  for  our  initial  set  of  test  data  three 
vectors  (Table  2). 


Tabje  2.  Three  vectors  constitute  the  Initial 
set  of  test  data. 


AD) 

A(2) 

A(3) 

data  1 

1 

2 

3 

data  2 

1 

3 

2 

data  3 

3 

1 

2 

How  sensitive  is  this  data?  By  inspection,  we 
notice  that  if  an  error  had  occurred  in  the  relational 
operation  of  the  IF  statement,  then  either  data  1. 
data  2.  or  data  3  would  have  distinguished  those 
errors,  except  for  one  case.  None  of  these  data 
vectors  distinguishes  .GE.  from  .GT.  in  the  if  state¬ 
ment.  Similarly,  these  vectors  distinguish  all  simple 
errors  in  constants  except  for  starting  the  do  loop 
at  "1"  rather  than  “2.”  All  simple  errors  in  vari¬ 
ables  are  likewise  distinguished  except  for  the 
errors  in  the  if  statement  which  replace  ''AID''  by 
“I"  or  by  “AIRI.” 

That  is,  if  we  run  the  data  set  above  in  any  of  the 
following  mutants  of  max,  we  get  the  same  results. 

SUBROUTINE  MAX  (A.N.Rl 
INTEGER  AINU.N.R 

1  R=1 

2  DO  3  I  ■  l.N.I 

3  IF(A(I|.GT.A(R))R  =  1 
RETURN 

END 


SUBROUTINE  MAX  (A.N.Rl 
INTEGER  AINU.N.R 

1  R-l 

2  DO  3  I=*2.N,1 

3  IFH.GT.AIRDRml 
RETURN 

END 


SUBROUTINE  MAX  (A.N.Rl 
INTEGER  AINU.N.R 

1  R=1 

2  DO  3  I=2.N.l 


3  IFlAlllGE  AlRllR*  I 
RETURN 
END 


SUBROUTINE  MAX  (A.N.Rl 
INTEGER  AINU.N.R 

1  R  =  I 

2  DO  3  I  =2. N.l 

3  1KIAIRI  GT.  AIRllR  ■  I 
RETURN 

END 


l^et  us  try  to  kill  as  many  of  these  mutants  us 
possible.  In  view  of  the  first  difficulty,  we  might 
guess  that  our  data  is  not  yet  adequate  because  it 
does  not  contain  repeated  elements  So.  let  us  add 

All)  AI2)  A(3) 
data  4  2  2  1 

Now,  replacing  .GT.  by  .GE.  and  running  on 
data  4  gives  erroneour  results  so  that  all  mutants 
arising  from  simple  relational  errors  are  dead.  Sur¬ 
prisingly,  data  4  a^so  distinguishes  the  two  errors 
in  All);  so.  we  are  left  with  only  the  last  mutant 
arising  from  the  "constant"  error:  variation  in  begin¬ 
ning  the  do  loop.  But  closer  inspection  of  the  pro¬ 
gram  indicates  that  starting  the  do  loop  at  "1" 
rather  than  "2"  has  no  effect  on  the  program,  other 
than  to  trivially  increase  its  running  time.  So  no 
choice  of  test  data  will  distinguish  this  "error," 
since  it  results  in  a  program  equivalent  to  max.  So 
we  conclude  that  since  the  test  data  1-4  leaves  only 
Live  mutants  that  are  equivalent  to  max,  it  is 
adequate. 


Comparisons  with  path  analysis 


This  example  illustrates  hidden  paths  in  a  program 
which  should  also  be  exercised  by  the  test  data.  To 
illustrate  what  hidden  paths  are.  consider  the 
Fortran  program— call  it  P— suggested  by  C.  V. 
Ramamoorthv  and  his  colleagues.'' 

INTEGER  A.B.C.D 
READ  10,  A.B.C 
10  FORMATI4IIOI 

5  IFKA.GE.BI  AND  (B.GE.Cl  I  GOTO  100 
PRINT  50 

50  FORMATllH  .’LENGTH  OF  TRIANGLE  NOT  IN 
IORDER‘1 
STOP 

100  IFlIA  EQ  Bl  OR  (B.EQ.Cll  GOTO  500 
A  =  A*A 
B  =  B*B 
C=C‘*2 
D=BvC 

IF(A.NE.D)GOTO200 
PRINT  150 

150  FORMATllH  , ‘RIGHT  ANGLED  TRIANGLE*! 
STOP 

200  IF  (A.LT.Dl.  GOTO  300 
PRINT  250 

250  FORMATllH  .‘OBTUSE  ANGLED  TR1ANGLE*| 
STOP 

300  PRINT  350 

350  FORMATll  H. ‘ACUTE  ANGLED  TRIANGLE‘1 


April  1978 


37 


5 


1 


STOP 

500  IF  ( (A.EQ.BI  AND.  (A.EQ.C)  I  GOTO  600 
PRINT  550 

550  FORMATIlH. ‘ISOCELES  TRI  ANGLE*) 

STOP 

600  PR  I  NT  650 

650  FORMATIlH  .‘EQUILATERAL  TRIANGLE*) 
STOP 
END 

The  intent  of  this  program  is  to  categorize  triangles, 
given  the  lengths  of  their  sides.  A  typical  path 
analysis  system  will  derive  test  data— call  it  T— 
which  exercises  all  paths  of  P  (Table  3). 


P  and  P‘  differ  only  in  the  logical  expressions 
found  at  statements  5  and  100  *  The  test  data  T 
does  not  sufficiently  test  the  compound  logical 
expressions  of  P:  T  only  tests  the  single-clause 
logicals  found  in  the  corresponding  statements  of 
P‘.  Hence.  T  is  a  stronger  test  of  P  than  is  T  (i.e.. 
for  P  we  have  more  confidence  in  the  adequacy  of 
T  than  in  the  adequacy  of  T).  Note  that  the  logical 
expression  in  statement  5  of  P  could  be  replaced 
by  B.GE.C  to  yield  a  program  P"  which  produces 
correct  answers  on  7”.  The  test  case  A  =5,  B=l. 
C=6  will  remedy  this  and  provide  still  a  stronger 
test  of  P. 


Table  3.  Test  data  T  to  exercise  the  Fortran  program  P. 


TEST  CASE 

A 

B 

C 

TRIANGLE  TYPE 

1 

2 

12 

27 

ILLEGAL 

2 

5 

4 

3 

RIGHT  ANGLE 

3 

26 

7 

7 

ISOSCELES 

4 

19 

19 

19 

EQUILATERAL 

5 

14 

6 

4 

OBTUSE 

6 

24 

23 

21 

ACUTE 

Now  consider  the  following  mutant  program  P' 

INTEGER.  A.B.C.D 
READ  10.A.B.C 
10  FORM  ATM  1 10) 

5  I  Ft  AGE.B  (GOTO  100 
PRINT  50 

50  FORMATIlH  .‘LENGTH  OF  TRIANGLE  NOT  IN 
lORDER*l 
STOP 

100  I  Ft  B.EQ  C  I  GOTO  500 
A=A»A 
B=B*B 
C=C**2 
D  =  B  +  C 

IF  (A.NE.DI  GOTO  200 
PRINT  150 

150  FORMATIlH  , 'RIGHT  ANGLED  TRIANGLE*) 
STOP 

200  IF  (A.LT.Dl  GOTO  300 
PRINT  250 

250  FORMATIlH  .‘OBTUSE  ANGLED  TRIANGLE*! 
STOP 

300  PRINT  350 

350  FORMATIlH  .‘ACUTE  ANGLED  TRIANGLE*) 
STOP 

500  IF  I  (A.EQ.BI  AND.  (A.EQ.C) )  GOTO  600 
PRINT  550 

550  FORMATIlH  *lSOCELES  TRIANGLE*) 

STOP 

600  PRINT  650 

650  FORMATIlH  .‘EQUILATERAL  TRIANGLE*) 
STOP 
END 

P'  prints  the  same  answers  as  P  on  T  but  P‘  is 
clearly  incorrect  since  it  categorizes  the  two  test 
cases  shown  in  Table  4  as  acute  angle  triangles'. 


Table  4.  Two  test  cases  ere  acute  angle  triangles. 


TEST  CASE 

A 

6 

C 

TRIANGLE  TYPE 

7 

7 

5 

6 

ILLEGAL 

8 

26 

26 

7 

ISOSCELES 

A  more  substantial  example 

Our  last  example  involves  the  find  program  of 
C.A.R.  Hoare.’  FIND  takes,  as  input,  an  integer  array 
A.  its  size  N  *  1,  and  an  array  index  F,  1  <  F  <  N  j 

After  execution  of  find,  all  elements  to  the  left  of  j 

A  IF)  have  values  no  larger  than  AIF)  and  all  elements 
to  the  right  are  no  smaller.  Clearly,  this  could  be 
achieved  by  sorting  A\  indeed.  FIND  is  an  inner 
loop  of  a  fast  sorting  algorithm,  although  find 
executes  faster  than  any  sorting  program.  The 
Fortran  version  of  find,  translated  directly  from 
the  Algol  version,  is  given  below: 

SUBROUTINE  FIND(A.N.F) 

C 

C  FORTRAN  VERSION  OF  HOARE  S  FIND 

C  PROGRAM  (DIRECT  TRANSLATION  OF 

C  THE  ALGOL  60  PROGRAM  FOUND  IN 

C  HOARE  S  "PROOF  OF  FIND”  ARTICLE 

C  IN  CACM  1971). 

INTEGER  AINI.N.F 
INTEGER  M.NS.R.I.J.W 
M  =  1 
NS=N 

10  IFIM.GE.NS)  GOTO  1000 

R=A(Fl 
I  =M 
J=NS 

20  IFII.GT.J)  GOTO  60 

30  IF(AII).GE.R)  GOTO  40 

1  =  1  +  1 
GOTO  30 

40  IFIR.GE.AIJ))  GOTO  50 

J=J-1 
GOTO  40 

50  IFII.GT.J)  GOTO  20 

C 

C  COULD  HAVE  CODED  GO  TO  60  DIRECTLY 

C  -DIDN'T  BECAUSE  THIS  REDUNDANCY 

C  IS  PRESENT  IN  HOARE'S  ALGOL 

C  PROGRAM  DUE  TO  THE  SEMANTICS  OF 

C  THE  WHILE  STATEMENT 

C 

W  =  All) 

A(1|  =  A(J) 

A(J)=W 
1=1  +  1 
J=J-1 
GOTO  20 


•The  clause  A  EQ  B  in  statement  500  is  redundant 


38 


COMPUTER 


6 


till  IFlFGT  Ji  GOTO  7(1 
NS  =  .J 
GOTO  10 

70  I  Fl  I  GT  Kl  GOTO  1000 

M  =  1 

GOTO  10 
1000  RETURN 
END 

KIM)  is  of  particular  interest  for  us  because  a 
subtle  multiple-error  mutant  of  find,  called  BUGGY 
find,  has  been  extensively  analyzed  by  select,  a 
system  that  generates  test  data  by  symbolic  execu¬ 
tion  '  In  find,  the  elements  of  A  are  interchanged 
depending  on  a  conditional  of  the  form 

X.LK  AlFi  AND.  A(F>  LE  Y 

Since  At  Ft  itself  may  be  exchanged,  the  effect  of 
tin-  test  is  preserved  bv  setting  a  temporary  vari 
able  H  =  Alt')  and  using  the  conditional 

X  .LE.  R  AND  R  .LE.  Y 

In  BUGGYF1ND,  the  temporary  variable  R  is  not 
used:  rathe.-,  the  first  form  of  the  conditional  is 
used  to  determine  whether  the  elements  of  A  are 
to  be  exchanged.  The  SELECT  system  derived  the 
test  data  A  -  (3,2.0,11  and  F  =  3,  on  which  BUGGY- 
find  fails.  The  authors  of  SELECT  observed  that 
Hi  GGYF1NP  fails  on  only  2  of  the  24  permutations 
of  10.1,2.3).  indicating  that  the  error  is  very  subtle  * 

We  will  first  describe  a  simple-error  analysis  of 
the  mutants  of  FIND,  beginning  with  initially  naive 
guesses  of  test  data  and  finishing  with  a  surpris¬ 
ingly  adequate  set  of  7  A  vectors.  This  data  will 
be  called  I),  The  detailed  analysis  needed  to  deter¬ 
mine  how  many  errors  are  distinguished  by  a  data 
set  were  carried  out  on  the  Mutation  system  at 
Yale  University. 

We  have  asked  several  colleagues  how  they 
would  test  FIND,  and  they  have  nearly  unanimously 
replied  that  they  would  use  permutations.  We  first 
describe  analysis  which  we  have  done  using  permu¬ 
tations  of  the  array  indices  as  data  elements.  In 
one  case,  we  use  all  permutations  of  length  4  and 
in  another  case,  we  use  random  permutations  of 
lengths  5  and  6.  Surprisingly,  the  intuitively  appeal¬ 
ing  choice  of  permutations  as  test  data  is  a  very 
poor  one. 

We  then  describe  analysis  in  which  another 
popular  intuitive  method  is  used:  random  data.  We 
show  that  the  adequacy  of  random  data  is  very 
dependent  on  the  interval  from  which  the  data  is 
drawn  (i.e.,  problem-specific  information  is  needed 
to  obtain  good  results). 

Finally,  we  find  evidence  for  the  coupling  effect  (i.e., 
adequate  simple-error  data  kills  multiple-error  mu¬ 
tants)  in  two  ways.  First,  the  multiple-error  mutant 
buggyfind  fails  on  the  test  data  D,.  Next,  we 
describe  the  very  favorable  results  of  executing 
random  multiple-error  mutants  of  find  on  U , 

We  begin  the  analysis  with  the  24  permutations 
of  10,1.2.3)  with  F  fixed  at  3.  The  results  are  sur- 

•Wf  found  that  HGGGYFIND  failed  on  only  the  aforementioned 

DurmiKa'if”) 


prisingly  poor,  as  38  live  mutants  are  left  That  is. 
with  these  24  vectors  there  are  58  possible  changes 
that  could  have  been  made  in  find  that  would  have 
yielded  identical  output.  Eventually,  by  increasing 
the  number  of  A  vectors  to  49,  only  10  live  mutants 
remain  Using  a  data  reduction  heuristic,  the  49  A 
vectors  can  be  reduced  to  a  set  of  seven  A  vectors, 
leaving  14  live  mutants  These  vectors  appear  in 
Table  5. 


Table  5.  D,  —  The  simple-error  adequate  data  for  FIND 


TEST ca$» 

A 

1 

i  34  n  4  ;>«’ 

1 V  .'??  -  v  1/> 

‘l 

v 

[7  l)  7) 

A 

CV  A4  l>» 

4 

t  -  b  -b  -  -  ‘n 

5 

(1  3  ?  0- 

3 

6 

10.2  3  d 

3 

7 

(01 

In  constructing  the  initial  data,  after  the  24  per¬ 
mutations,  the  49  A  vectors  were  chosen  somewhat 
haphazardly  at  first.  Later,  A  vectors  were  chosen 
specifically  to  eliminate  a  small  subset  of  the 
remaining  errors.  There  were  some  interesting 
observations  concerning  the  49  vectors: 

111  The  average  A  vectors  kills  about  550  mutants. 

(21  The  “best"  A  vector  kills  703  mutants  (test 

case  1  of  Table  51. 

13)  The  “worst  '  A  vector  kills  only  70  mutants. 

This  was  the  degenerate  A  —  (01. 

The  data  reduction  heuristic  uses  both  the  best 
and  the  worst  A  vectors  to  pare  the  49  A  vectors 
to  seven 

The  final  step  in  showing  that  the  data  of  Table  5 
is  indeed  adequate  is  to  show  that  the  14  remain¬ 
ing  mutants  are  programs  that  are  actually  equiva¬ 
lent  to  FIND.  That  is,  the  14  "errors"  that  could 
have  been  made  are  not  really  errors  at  all.  One 
might  be  surprised  at  the  large  number  of  equiva¬ 
lent  mutants  (approximately  2  percent).  This  we 
attribute  to  find*  long  history  (it  was  first  pub¬ 
lished  in  1961).  Over  the  years,  find  has  been 
“honed"  to  a  very  efficient  state— so  efficient  that 
many  slight  variations  result  in  equivalent  but 
slower  programs.  For  example,  the  conditional 

1.  GT.  F 

in  the  statement  labeled  70  in  the  find  can  be 
replaced  by  any  logically  false  conditional,  or  the 
if  statement  can  be  replaced  by  a  continue  state¬ 
ment,  to  result  in  an  equivalent  but  slower  program 
It  is  not  likely  that  this  phenomenon  will  occur 
in  programs  which  haven't  been  “fine-tuned."  We 
estimate  that  production  programs  have  well  under 
1  percent  equivalent  mutants. 

Let  us  now  compare  D,  with  exhaustive  tests  on 
permutations  of  (0,1,2,31  and  then  with  tests  on 


Aon!  '978 


39 


7 


random  permutations  of  (0,1,2.3.41  and  (0,1,2,3,4,51. 
Table  6  describes  the  results  for  all  permutations 
of  (0.1.2.3). 


Table  6.  Results  of  all  permutations  ol  (1,2, 3, 4). 


NUMBER  OF 
TEST  CASES 

VALUES  OF  F 

NUMBER  OF 

LIVE  MUTANTS 

24 

1 

158 

24 

2 

60 

24 

3 

58 

24 

4 

141 

.  96 

1  2.3  &4 

38 

In  Table  7  the  same  information  is  provided  for 
the  case  of  random  test  data. 


Table  7.  Results  ol  random  permutations. 


NUMBER  OF 
RANDOM 
TEST  CASES 

SIZE  OF  A 

VALUE  OF  F 

NUMBER  OF 
LIVE  MUTANTS 

10 

UNIFORM  FROM 

UNIFORM  FROM 

88 

15.61 

1  TO  SIZE  OF  A 

25 

I 

1 

65 

50 

54 

100 

! 

1 

54 

1000 

T 

53 

Although  the  intervals  in  Table  8  are  poor,  one 
could  conceive  of  worse  intervals  For  example, 
draw  A  from  |1,  size  of  A |.  However,  in  view  of  the 
permutation  results,  such  data  will  surely  behave 
worse  than  that  of  Table  8 

Three  points  are  in  order.  First,  even  with  very 
bad  data,  D ,  is  much  better  than  simple  permuta¬ 
tions.  Second,  it  took  1000  very  good  random 
vectors  to  perform  as  well  as  D,  Third,  using 
random  vectors  yields  little  insight.  The  insight 
gained  in  constructing  D ,  was  crucial  to  detecting 
the  equivalent  versions  of  find. 

The  coupling  effect  shows  itself  in  two  ways. 
First,  buooyfind  fails  on  the  adequate  D,.  hence, 
we  have  a  concrete  example  of  the  coupling  effect 
Although  the  second  observation  involves  random¬ 
ness,  and  thus  is  indirect,  it  is  perhaps  more 
convincing  than  |he  "one  point”  concrete  buggyfind 
example.  We  have  randomly  generated  a  large 
number  of  A-error  mutants  for  *  >  1  (called  higher 
order  mutants)  and  executed  them  on  D,. 

Because  the  number  of  mutants  produced  by  com¬ 
plex  errors  can  grow  combinatorially,  it  is  hopeless 
to  try  the  complete  mutation  analysis  on  complex 
mutants,  but  it  is  possible  to  select  mutants  at 
random  for  execution  on  />,.  Of  more  than  22.000 
higher-order  errors  encountered,  only  19  succeed 
on  D These  19  have  been  shown  to  be  equivalent 
to  find.  Indeed,  we  have  yet  to  produce  an  incor¬ 
rect  higher-order  mutant  which  suceeds  on  D  ' 


As  the  data  indicates,  permutations  give  rather 
poor  results  compared  with  D,. 

Our  analysis  with  random  data  can  be  divided 
into  two  cases:  runs  in  which  the  vectors  were 
drawn  from  poorly  chosen  intervals  and  runs  in 
which  the  vectors  were  chosen  from  a  good  interval 
(-100,100).  The  results  are  described  in  Tables  8 
and  9. 


Table  8.  Results  ol  random  data  from  poorly  chosen  Intervals. 


NUMBER  OF 
RANDOM 
VECTORS 

RANGE  OVER 
WHICH  VECTOR 
VALUES  DRAWN 

RANGE  OVER 
WHICH  SIZE 

OF  A  DRAWN 

VALUE 

OF  F 

NUMBER  OF 
LIVE  MUTANTS 

10 

1100,2001 

11.20) 

UNIFORM 

28 

10 

(-  200  -  100] 

11.201 

FROM 

28 

10 

o 

o> 

1 

o 

o 

1 

P  20] 

SIZE 

OF 

VECTOR 

25 

Table  9.  Results  of  random  data  drawn  from  [-100,100); 
other  parameters  as  In  Table  8. 


NUMBER  OF 
RANOOM 
VECTORS 

NUMBER  OF 

LIVE  MUTANTS 

10 

22 

50 

17 

100 

11 

1000 

10 

Conclusions 

Our  first  conclusion  is  that  systematically  pur 
suing  test  data  which  distinguishes  errors  from  a 
given  class  of  errors  also  yields  "advice"  to  be 
used  in  generating  test  data  for  similar  programs. 
For  instance,  the  examples  above  lead  us  to  the 
following  principles  for  creating  random  or  non- 
random  test  data  for  Fortran-Like  programs  which 
manipulate  arrays  (i.e..  programs  in  which  array 
values  can  also  be  used  as  array  indices): 

(II  Include  cases  in  which  array  values  are  out¬ 
side  the  size  of  the  array 

(2)  Include  cases  in  which  array  values  are 
negative. 

(31  Include  cases  in  which  array  values  are  re¬ 
peated. 

(4)  Include  such  degenerate  cases  as  £),'s  A  =  10) 
and  A  =  (-5.  — 5. -5. -5). 

Principle  (4)  was  also  noticed  by  Goodenough  and 
Gerhart.’ 

It  is  important  that  a  testing  strategy  be  con¬ 
ducive  to  the  formation  of  hypotheses  about  the 
way  test  data  should  be  selected  in  future  tasks. 
Information  transferred  between  programming  tasks 
provides  a  source  of  "virtual  resources"  to  be  used 
in  subsequent  work.  Since  the  amount  of  available 
resources  is  limited  by  economic  and  political 
barriers,  experience— which  has  the  effect  of  expand¬ 
ing  resources— takes  on  a  special  importance.  It  is. 


40 


COMPUTER 


8 


Seemingly  simple  techniques  can  be 
quite  sensitive  via  the  coupling  effect. 


of  course,  helpful  to  have  available  such  mechanical 
aids  as  the  mutation  system,  but  as  we  have  shown 
even  in  the  absence  of  the  appropriate  statistical 
information,  a  programmer  can  be  reasonably  con¬ 
fident  that  he  is  improving  his  test  data  selection 
strategy. 

A  second  conclusion  is  that  until  more  general 
strategies  for  systematic  testing  emerge,  program¬ 
mers  are  probably  better  off  using  the  tools  and 
insights  they  have  in  great  abundance.  Instead  of 
guessing  at  deeply  rooted  sources  of  error,  they 
should  use  their  specialized  knowledge  about  the 
most  likely  sources  of  error  in  their  application. 
We  have  tried  to  illustrate  that  seemingly  simple 
tests  can  be  quite  sensitive,  via  the  coupling  effect. 

The  techniques  we  advocate  here  are  hardly  ever 
general  techniques.  In  a  sense,  they  require  one  to 
deal  directly  in  the  details  of  both  coding  and  the 
application— a  notion  that  is  certainly  contrary  to 
currently  popular  methodologies  for  validating 
software.  But  we  believe  there  is  ample  evidence  in 
man's  intellectual  history  that  he  does  not  solve 
important  problems  by  viewing  them  from  a  dis¬ 
tance.  In  fact,  there  is  an  Alice  In  Wonderland 
quality  to  fields  which  claim  they  can  solve  other 
people's  problems  without  knowing  anything  in 
particular  about  the  problems. 

So.  there  is  certainly  no  need  to  apologize  for 
applying  ad  hoc  strategies  in  program  testing.  A 
programmer  who  considers  his  problems  well  and 
skillfully  applies  appropriate  techniques  to  their 
solution— regardless  of  where  the  techniques  arise— 
will  succeed.  ■ 


References 

1.  A.  E.  Tucker.  "The  Correlation  of  Computer  Program 
Quality  with  Testing  Effort,"  System  Development 
Corporation,  TM  2219/000/00.  January  1965 

2.  R.  A.  DeMillo,  R.  J.  Upton.  A.  J.  Perlis.  "Social  Pro¬ 
cesses  and  Proofs  of  Programs  and  Theorems,”  Proc 
Fourth  ACM  Symposium  on  Principles  of  Program¬ 
ming  Languages,  pp.  206-214.  (To  appear  in  CACM) 

3.  John  B.  Goodenough  and  Susan  L.  Gerhart,  "Toward 
a  Theory  of  Test  Data  Selection."  Proc  International 
Conference  on  Reliable  Software.  SIGPLAN  Notices, 
Vol  10.  No  6,  June  1975.  pp.  493-510. 

4  E  A  Youngs.  Error- Proneness  in  Programming.  PhD 
thesis.  University  of  North  Carolina.  197 1 . 

5  T  A.  Budd.  R.  A.  DeMillo,  R.  J.  Upton.  F.  G.  Sayward, 
"The  Design  of  a  Prototype  Mutation  System  for  Pro¬ 
gram  Testing."  Proc  .  1978  NCC. 


6  C.  V.  Ramamoorthy,  S.  F.  Ho,  and  W.  T.  Chen.  "On 
the  Automated  Generation  of  Program  Teat  Data," 
IEEE  Trans  on  Software  Engineering.  Vol  SE-2.  No 

4.  December  1976,  pp  293-300 

7.  C.  A  R.  Hoare.  "Algorithms  65.  FIND."  CACM.  Vol  4. 
No.  1.  April  1961.  pp.  321. 

8  R.  S.  Boyer,  B  Etspas,  K.  N  Levitt,  "SELECT-A 
System  for  Testing  and  Debugging  Programs  by 
Symbolic  Execution."  Proc  International  Conference 
on  Reliable  Software.  SIGPLAN  Notices.  Vol  10. 
No.  6.  June  1975.  pp  234  245 


V 


Richard  DeMillo  has  been  an  associate 
professor  of  computer  science  at  the 
Georgia  Institute  of  Technology  since 
1976.  During  the  four  years  prior 
to  that  he  was  assistant  professor  of 
computer  science  at  the  University  of 
Wisconsin-Milwaukee. 

A  technical  consultant  to  several 
government  and  research  agencies  and 
to  private  industry,  he  is  interested 
in  the  theory  of  computing,  programming  languages, 
and  programming  methodology. 

DeMillo  received  the  BA  in  mathematics  from  the 
College  of  St.  Thomas.  St.  Paul.  Minnesota,  and  the 
PhD  in  information  and  computer  science  from  the 
Georgia  Institute  of  Technology.  He  is  a  member  of 
ACM.  the  American  Mathematical  Society,  AAAS.  and 
the  Association  for  Symbolic  Logic. 


Richard  J.  Upton  is  an  associate  professor  of  computer 
science  at  Yale  University.  A  faculty  member  since  1973. 
he  pursues  research  interests  in  computational  complexitv 
and  in  mathematical  modeling  of  computer  systems  He 
is  also  a  technical  consultant  to  several  government 
agencies  and  to  private  industry 
Lipton  received  the  BS  in  mathematics  from  Case 
Western  Reserve  University  and  the  PhD  from  Carnegie 
Mellon  University 


Frederick  G.  Sayward  is  an  assistant  professor  of  com¬ 
puter  science  at  Yale  University,  where  he  pursues 
research  interests  in  semantical  methods  for  program 
ming  languages,  the  theory  of  parallel  computation  as 
applied  to  operating  systems,  the  development  of  pro 
gramming  test  methods,  and  techniques  for  fault -tolerant 
computation  Earlier,  he  worked  as  a  scientific  and  sys¬ 
tems  programmer  at  MIT  Lincoln  Laboratory. 

A  member  of  ACM.  the  American  Mathematical  Society, 
and  Sigma  Xi,  Sayward  received  the  BS  in  mathematic*, 
from  Southeastern  Massachusetts  University,  the  MS  in 
computer  science  from  the  University  of  Wisconsin 
Madison,  and  the  PhD  in  applied  mathematics  from 
Brown  University 


Apol 1978 


41 


9 


PROGRAM  MUTATION;  A  NEW  APPROACH  TO  PROGRAM  TESTING 


R  A  DeMillo 

School  of  Information  and  Computer  Science 
Georgia  Institute  of  Technology 
Atlanta  GA 


R  J  Lipton 


F  G  Sayward 


Department  of  Computer  Science 
Vale  University 
New  Haven  CT 


ACKNOWLEDGED  EK  T 


Wi  acknowledge  the  work  of  Tim  Pudd  and  Mike  Lebowitz  and  the  other  members  of  the 
j ;  c  :  n  i  vet  s :  t  ■/  Testxna  Group  for  help  in  implementing  and  experimenting  with  the 
:  r  o  f  o  f  u  re  mutation  iiistcw. 


i 

L 


(c)  R  A  DeMillo,  R  J  Lipton  and  F  G  Sayward  1979 


10 

PROGRAM  MUTATION:  A  NEW  APPROACH  TO  PROGRAM  TESTING 


ABSTRACT 

Unlike  contemporary  software  validation  methods,  where  the  goal  is  to  establish  absol¬ 
ute  program  correctness,  program  mutation  is  a  testing  method  which  has  a  less  ambitious 
but  quite  useful  goal:  to  establish  that  a  program  is  either  correct  or  is  ‘radically' 
incorrect.  The  basic  concepts  of  program  mutation  are  explained  as  well  as  how  the 
method  is  applied  in  building  interactive  program  mutation  systems  which  aid  users  in 
establishing  this  goal.  Also,  the  applications  of  program  mutation  as  a  software  pro¬ 
ject  management  tool  and  as  a  tool  for  assessing  the  quality  of  procured  software  are 
overviewed.  A  prototype  mutation  system  for  a  non-trivial  subset  of  FORTRAN  has  been 
implemented  and  initial  experience  with  this  system  is  reported.  A  system  for  nearly 
full  ANSI  FORTRAN  is  about  half  implemented  and  is  expected  to  be  ready  by  early  Fali 
197b. 


INTRODUCTION 

Program  testing  is  an  inductive  science  which  addresses  the  following  fundamental  ques¬ 
tion  : 

If  a  program  is  correct  on  a  finite  number  of  test  cases,  is  it  correct  in  general? 

Finite  test  data  which  implies  general  correctness  is  called  adequate  test  data  (004) 
and  since  adequate  test  data  cannot  in  general  be  derived  algorithmically  (003),  program 
testing  cannot  in  general  be  deductive.  Recently,  path  analysis  (001,002,005,006)  and 
symbolic  execution  (007,008)  have  emerged  as  methods  which  allow  one  to  gain  confidence 
in  one's  test  data's  adequacy.  Although  as  with  any  inductive  science  it  is  possible 
to  make  false  inferences  with  path  analysis,  the  basic  idea  is  undeniable:  test  data 
which  exercises  all  flowchart  control  paths  of  a  program  at  least  once  must  be  better 
than  test  data  which  does  not. 

It  has  been  said  (020)  that  from  a  scientific  point  of  view  program  testing  can  hardly 
be  said  to  be  in  its  infancy.  The  software  engineering  community,  most  notably  the 
program  verification  school,  continue  to  point  out  that  program  testing  is  insufficient 
to  guarantee  program  correctness  (see  (019)  for  an  argument  against  program  verificat¬ 
ion).  We  agree.  However,  since  program  testing  has  been  used  in  developing  all  soft¬ 
ware  that  has  ever  solved  any  real  problems,  we  must  ask  the  following  rather  obvious 
quest  ion : 

Given  that  program  testing,  while  not  a  perfect  technique,  has  proved  to  be  a  very 


109 


11 

useful  technique,  how  we  can  develop  testing  methodologies  which  have  less  than 
perfection  (absolute  program  correctness)  as  their  goals  yet  still  yield  substantial 
gains? 

It  is  all  too  easy  (and  wrong)  to  take  the  popular  viewpoint  that  program  building  is 
a  purely  logical  deductive  activity  to  which  program  testing  is  unsuitable.  Our  view¬ 
point  is  that  program  design  and  development  is  an  empirical  engineering  activity  for 
which  an  inferential  formalism  has  not  yet  been  developed.  However,  it  seems  clear 
that  such  a  formalism  is  not  entirely  necessary  if  one  is  willing  to  accept  that  pro- 
gramming  is  a  human,  inductive  activity  which  may  never  be  subject  to  complete  formal¬ 
ism. 

In  this  paper  we  will  describe  an  on-going  research  effort  which  is  aimed  at  achieving 
gains  from  program  testing  while  not  ensuring  perfection.  We  call  our  testing  method- 
ology  program  mutation .  Besides  discussing  the  method,  we  will  overview  a  prototype 
system  which  implements  the  method  and  report  our  initial  experiences  with  program 
mutation.  Moreover,  unlike  the  deductive  approaches  to  software  reliability  such  as 
program  verification  (009*010),  program  mutation  provides  quantitative  information  on 
the  status  of  software  development.  We  will  explain  how  this  information  can  be  used 
effectively  throughout  a  software  project's  management  hierarchy.  We  will  also  explain 
how  it  can  be  used  as  a  quality  measure  for  procured  software. 


THE  PROGRAM  MUTATION  METHODOLOGY 

It  has  been  observed  (011,022)  that  the  vast  majority  of  errors  that  remain  in  software 
once  it  has  been  tested  and  put  into  production  tend  not  to  be  radical  errors*  but 
rather  are  interacting  combinations  of  simple  errors.  Indeed,  there  are  many  'horror' 
stories  similar  to  the  failure  of  ah  early  Vangard  missile  launch  because  of  a  missing 
right  parenthesis  in  a  controlling  program.  So  a  reasonable  goal  of  program  testing 
is  to  rule  out  all  combinations  of  simple  errors.  That  is,  design  a  program  testing 
method  with  the  goal  being  that  if  a  program  passes  the  test  then  either: 

1  The  program  is  correct. 

2  The  program  is  radically  incorrect. 

Even  this  seems  too  ambitious  if  one  attacks  directly.  First,  given  a  program  we  must 
be  able  to  generate  all  of  its  simple  errors.  Assuming  that  his  can  be  done,  we  next 
must  eliminate  the  simple  errors  and  the  complex  errors  which  emanate  from  their  com¬ 
binations.  Clearly  the  number  of  complex  errors  will  be  a  cc abinatoria 1  explosion  m 
the  number  of  simple  errors.  While  it  may  be  feasible  to  eliminate  all  simple  errors, 
explicit  elimination  of  all  complex  errors  appears  intractable. 

The  goal  of  the  program  mutation  testing  methodology  is  to  establish  that  a  given  pro¬ 
gram  is  either  correct  or  radically  incorrect.  Let  i  be  the  programming  language  under 
consideration.  A  mutant  operator  is  a  simple  program  transformation,  dependent  or.  l 
which  produces  mutant  programs  of  a  given  program  p.  The  mutants  are  also  prorra"  s  m 


There  are  no  agreed  on  technical  definitions  of  error  categories.  Wo  tov  w.  1 !  be  inf-vru.  .  r.iu- 

lcal  we  mean  errors  due  to  grossly  misunderstanding  the  program  speci  f  i  c  a  t  i  ons  .  Errors  wruoh  ,»ro 
difficult  if  not  impossible  to  capture  by  general  algorithmic  methods  but  which  would  easily  be  ob¬ 
served  by  almost  any  test  or  when  the  software  is  first  put  into  production.  An  example  would  u 
forgetting  to  Include  an  action  sequence  in  a  decision  table  program. 


110 


For  example,  If 


12 


j  »  j  ♦  l 

is  a  statement  in  P,  then 

l  *  I- 1 
l  =  i*2 

i  =  1*0  (i  e  a  no-op) 

are  all  simple  changes  which  lead  to  three  mutants  of  p.  The  goal  of  the  nutant  oper¬ 
ator  is  to  introduce  simple  errors  in  p,  thus  producing  mutants  of  p.  Alternatively, 
if  p  is  incorrect  due  to  a  single  simple  error,  some  mutant  would  be  a  correct  proqran 
for  the  given  task.  There  should  be  several  mutant  operators,  each  corresponding  tn 
different  classes  of  simple  errors  that  may  occur  in  {..  l.ot  m(D  denote  the  sot  •>! 
all  mutants  of  p.  Ideally,  *(/*)  should  contain  mutants  correspond  >  no  to  a  I  l  ami  i  »i  *  !  y 
the  possible  simple  cricts.  Ilowevoi  ,  this  is  too  ambitious  a  goal  for  general  purpose 
program  transformations  and  we  relax  the  requirement  to  be  that  w(p)  covers  all  simple 
errors  in  the  sense  that  m{p)  may  also  contain  mutants  which  are  equivalent  to  P.  Vo¬ 
let  m*  (p)  denote  all  the  mutants  of  p  which  come  from  multiple  applications  of  mutant 
operators  on  r.  Tnese  mutants  are  also  programs  in  l. 

Let  D  be  the  input  domain  of  r.  p  is  said  to  pass  the  mutation  test  with  data  r  if 
there  exists  T  a  subset  of  D  such  that: 

•  p  works  as  intended  on  T 

•  For  each  mutant  m  in  m(p)  either 

-  ir.  fails  to  work  as  intended  on  r,  or 

-  m  is  equivalent  to  r. 

If  P  passes  the  mutant  test  then  we  are  sure  that  p  is  free  of  simple  errors.  But  what 
of  complex  errors?  To  this  end  we  have  observed  a  coupling  effect  which  states: 

Test  data  T  which  causes  all  the  non-oqu  1  v«i lent  mutants  ot  w(r)  to  fall  is  so  sensi¬ 
tive  that  .ill  tin*  non-equivalent  mutants  of  «*</■)  must  a  iso  fail  on  r. 

The  justification  of  the  coupling  effect  parallels  the  probabalistic  argument  for 
justifying  the  single  fault  methods  used  to  test  circuits  (021 ) -  However,  we  have 
no  theory  to  make  it  a  hard-fast  principle.  Basically,  if  several  simple  errors  (de¬ 
tectable  by  T)  combine  to  make  a  complex  error  then  it  is  extremely  unlikely  the  simple 
errors  will  cancel  to  allow  the  successful  execution  on  r  of  the  mutant  containing  the 
complex  error.  The  goal  of  program  mutation  theory  is  then  to  validate,  depending  on 
L  either  deductively  or  experimentally,  the  coupling  effect  for  language  l  by  establish¬ 
ing  the  following  metatheorem  of  program  mutation: 

•  If  p  passes  the  mutation  test  then  either 

p  is  correct,  or 

-  p  is  radically  incorrect. 

It  is  not  hard  to  see  that  if  the  metatheorem  holds  for  language  L  and  if  p  is  a  non- 
radicalty  Incorrect  program,  then  it  Is  impossible  for  r  to  pass  the  mutation  test. 

In  (017)  the  mutation  metatheorem  has  been  formally  shown  to  hold  where  t  is  certain 
classes  of  decision  tables  and  the  mutant  operator  involves  the  reformulation  of  conditions 


111 


and  applied  actions.  Currently,  programs  which  manipulate  data  structures  are  under 
Investigation. 

For  general  purpose  programming  languages  such  as  FORTRAN,  the  ta6k  is  more  difficult. 
There  is  a  noticeable  lack  of  empirical  studies  on  programing  errors  to  draw  on  in 
formulating  a  complete  set  of  mutant  operators  -  a  necessary  requirement  for  program 
mutation  to  be  deductive.  Here,  complete  means  that  all  simple  errors  will  be  captured 
in  m(p).  Hence,  at  least  for  now,  in  the  case  of  general  purpose  languages  we  can  con¬ 
sider  program  mutation  to  be  an  inductive  tool  for  gaining  confidence  that  the  meta- 
theorem  of  program  mutation  holds  for  a  particular  program  p.  A  prototype  system  for 
a  subset  of  FORTRAN  will  be  overviewed  below.  Some  initial  experience  with  it,  tne 
effectiveness  of  the  implemented  mutant  operators  and  substantiations  of  the  coupling 
effect  can  be  found  in  (013).  A  mutation  system  for  nearly  full  ANSI  FORTRAN  has  been 
designed  and  is  about  half  written.  Several  experiments  to  finding  ‘good’  mutant  oper¬ 
ators  and  for  evaluating  the  effectiveness  of  mutation  testing  are  under  consideration. 


PROGRAM  MUTATION  APPLIED 

We  do  not  intend  that  program  mutation  can  be  effectively  used  by  the  novice  programmer 
Rather  (unlike  previous  software  reliability  methods)  in  program  mutation  we  are  making 
and  exploiting  the  following  assumption: 

Experienced  programmers  write  programs  which  are  either  correct  or  are  'almost' 
correct . 

That  is,  in  the  mutation  terminology: 

If  a  program  is  not  correct,  then  it  is  a  'mutant'  -  it  differs  from  a  correct  pro¬ 
gram  by  simple  well-understood  errors. 

There  is  empirical  evidence  which  supports  this  natural  premise  (011,022). 

In  order  that  it  be  feasible  to  perform  the  mutation  test,  the  size  of  « ( P )  and  r  must 
be  small.  Our  view  is  that  the  mutation  system  should  be  interactive.  The  user  speci¬ 
fies  the  program  p  and  initial  test  data  r,  to  the  system  whence  the  mutant  operators 
are  applied  to  P,  thereby  generating  the  mutants  of  P.  The  mutants  are  then  executed 
on  T|,  A  list  of  mutants  which  fail  and  which  succeed  on  r,  is  produced.  If  all  mut¬ 
ants  give  incorrect  results  then,  by  the  coupling  effect  and  the  experienced  programmer 
assumption,  it  is  very  likely  that  P  is  correct.  On  the  other  hand,  if  some  mutants 
are  correct  on  T i,  then  the  user  must  then  examine  the  results  of  the  mutation  run  to 
determine: 

e  p  contains  a  non-radical  error 

e  Because  mutants  which  should  have  failed  did  not,  fi  is  inadequate  and  must  be  aua- 
mented  to  t2  and  the  system  re-run 

e  Some  mutants  are  equivalent  to  p.  Currently,  this  must  be  done  manually  but  there 
is  hope  that  symbolic  execution  techniques  can  partially  automate  this  task. 

This  cycle  can  be  viewed  as  a  series  of  interactive  sessions  in  which  the  user  defends 
p  and  the  current  test  data  against  a  system  adversary  which  asks  questions  of  the  form 


14 

Why  does  your  test  data  not  distinguish  this  simple  error? 

Such  an  adversary  forces  the  user  of  program  mutation  into  a  careful  and  detailed  re¬ 
view  of  his  program  and  the  design  decisions  made  in  constructing  it.  The  issues  which 
the  user  must  address  include: 

•  Which  mutant  operators  should  be  applied  to  the  program? 

•  Are  the  program  and  its  mutants  correct  on  the  given  test  data? 

•  Is  a  given  mutant  equivalent  to  the  program? 

In  this  view  we  hold  hope  that  even  radical  errors  can  be  uncovered  by  users  of  program 
mutation . 


OTHER  APPLICATIONS  OF  PROGRAM  MUTATION 

Several  approaches  to  aid  in  the  design,  implementation  and  debugging  of  large-scale 
software  have  recently  emerged.  Examples  are  restricted  modularization,  structured 
programming,  and  program  verification.  However  helpful  they  may  be  to  programmers  and 
low-level  managers,  the  effects  of  these  techniques  cannot  be  utilized  throughout  the 
project  management  hierarchy  since  they  are  qualitative  rather  than  quantitative;  man¬ 
agers  should  not  be  expected  to  understand  code  and/or  sophisticated  mathematics. 

Besides  being  a  tool  for  determing  adequate  test  data,  program  mutation  also  provides 
the  type  of  information  that  managers  need  to  monitor  software  development  and  personne 
performance.  By  using  an  automated  program  mutation  system  with  report  generation  cap¬ 
ability,  during  software  development  managers  may  extract  information  such  as: 

•  Mutant  failure  percentages  for  each  module  indicating  how  close  the  software  is  to 
beinq  acceptable 

•  Who  is  responsible  for  classifying  which  mutants  as  equivalent 

•  Which  mutants  have  yet  to  fail. 

This  quantitative  information  can  be  used  in  several  ways  at  different  levels  in  the 
management  hierarchy.  Among  these  are: 

•  Re-assignment  of  personnel  to  work  on  modules  where  the  mutant  failure  rate  is  low 

•  Pinpointing  responsibility  for  modules  which  fail  after  having  been  deemed  accept¬ 
able 

•  Forced  justification  of  why  certain  equivalent  mutants  exist 

•  Monitoring  PERT  chart  adherence 

•  Rewarding  personnel  who  achieve  high  mutant  failure  percentages. 

See  (023)  for  a  description  of  how  program  mutation  can  be  integrated  with  the  chief 
programmer  management  concept. 

Government  agencies  and  profit  making  industries  are  currently  finding  that  purchasing 


11  ? 


15 


software  from  specialized  software  vendors  is  more  economical  than  in-house  development. 
The  contracts  generally  consist  of  the  specifications  for  the  software  and  a  date  on 
which  the  software  and  test  data  on  which  the  software  meets  the  specifications  are  to 
be  delivered.  Occasionally,  some  test  data  is  given  with  the  specifications.  Two 
problems  for  the  contractor  are  apparent  in  this  scheme: 

e  At  any  time  during  the  contract  period  the  purchaser  has  no  indication  as  to  how 
'close'  the  software  is  to  being  ready 

e  Upon  delivery,  although  the  software  works  correctly  on  the  supplied  test  data, 
there  is  no  way  to  measure  the  quality  of  the  purchased  software. 

We  see  program  mutation  as  a  partial  solution  to  the  first  problem  and  as  a  definite 
solution  to  the  second. 

Since  program  testing  is  the  final  stage  of  software  development,  a  contractor  can 
specify  that  the  vendor  indicates  at  what  point  testing  commences.  Assuming  that  the 
vendor  is  using  a  mutation  system,  the  contractor  can  monitor  the  final  stage  of  devel¬ 
opment  by  having  the  vendor  periodically  report  mutant  elimination  percentages. 

To  evaluate  the  delivered  software,  one  can  specify  in  contracts  that  the  test  data 
of  modules  must  eliminate  a  certain  percentage  of  the  mutants  with  respect  to  'standard' 
mutant  operators.  Here  there  are  many  options.  Software  not  passing  this  quality  test 
may  be  rejected  or  there  could  be  a  substantial  financial  penalty  to  the  vendor.  In 
this  case  it  is  not  essential  that  the  vendor  uses  a  mutation  system,  only  that  the  con¬ 
tractor  has  one  available  to  evaluate  the  final  product.  Also,  note  that  the  contractor 
is  not  concerned  with  equivalent  mutants;  rather,  a  simple  test  (which  can  be  entirely 
computerized)  dependent  solely  on  the  mutant  operators  is  used.  Currently,  we  have 
little  Information  on  which  mutant  operators  should  be  employed  in  this  test,  however, 
experiments  to  answer  this  question  are  in  progress. 


THE  PROTOTYPE  FORTRAN  MUTATION  SYSTEM 

A  prototype  mutation  system  for  a  large  Bubset  of  FORTRAN  has  been  implemented  as  an 
interactive  system  on  the  PDP-10.  See  (018)  for  a  more  detailed  description  than  the 
following.  We  chose  FORTRAN  as  the  source  language  in  our  first,  implementation  of  a 
mutation  system  since  there  is  a  large  body  of  existing  programs  on  which  we  can  experi¬ 
ment.  However,  the  methodology  is  language-independent  -  mutation  systems  for  other 
languages  are  in  the  design  stage. 

The  programs  considered  are  FORTRAN  subroutines  with  the  following  data  types  and 
statement  types; 

•  Integer  constants  and  variable 

•  One  and  two  dimensional  arrays 

•  GOTO  statements 

•  CONTINUE  statements 

•  ASSIGNMENT  statements  with  general  arithmetic  expressions 

•  RETRURN  statements 

•  Logical  If  statements  with  general  relational  and  logical  expressions 
e  00  loops  with  one  level  of  embedding. 


r 

* 


no 


16 

The  mutant  operators  which  the  system  can  apply  to  a  program  lall  into  lour  cuteyi-ru*!.: 

•  Declaration  mutations.  There  are  mutant  operators  to  insert  default  array  limits 
and  to  permute  the  limits  of  two-dimensional  arrays. 

•  Data  reference  mutations.  Data  references  are  instances  of  constants,  scalar  var¬ 
iables,  and  references  to  one  and  two-dimensional  arrays  in  the  statements  of  the 
program.  There  are  mutant  operators  to  replace  any  data  reference  by  any  other 
reference  in  the  program  as  well  as  an  operator  to  replace  constants  by  other  con¬ 
stants  not  necessarily  appearing  in  the  program.  Also,  there  is  an  operator  to  per¬ 
mute  the  index  expressions  of  references  to  two-dimensional  arrays. 

•  Operator  evaluation  mutations.  There  are  mutant  operators  to  replace  occurrene<“. 
nf  arithmetic  operators  in  the  program  by  all  the  other  arithmetic  operators.  Turn 
are  imitant  operators  to  do  likewise  for  relational  and  loiiic.il  operaty: 

•  Control  mutations.  There  are  mutant  operators  to  replace  the  label  portion  of  GOT'-' 
statements  by  each  of  the  other  statement  labels  appearing  in  the  program.  Also, 
there  are  mutant  operators  to  see  if  all  control  paths  of  the  program  are  traversed 
at  least  once,  to  force  00  loops  to  end  on  continue  statements,  to  force  DO  loops 
not  to  end  on  continue  statements  and  to  replace  each  statement  of  the  proarar  by  a 
return  statement. 

As  discussed  above,  these  operators  are  designed  to  capture  simple  errors  and  to  assesr 
the  adequacy  of  the  given  test  data  to  distinguish  them.  For  example,  in  making  an 
array  have  dimension,  one  checks  whether  the  test  data  is  causing  the  array  to  be  acc¬ 
essed  other  than  as  a  scalar. 

The  user  specifies  to  the  system  his  program,  test  data,  and  the  mutant  operators  he 
wishes  to  be  applied.  The  system  then  generates  and  executes  the  mutants  on  the  test 
data  and  produces  a  report  indicating  which  mutants  are  correct  and  which  fail  on  the 
civen  test  data.  Various  profiles  and  other  useful  information  are  also  reported.  Ar. 
example  of  the  report  produced  by  the  system  is  given  in  the  appendix.  The  deterrm- 
aticn  of  mutant  correctness  or  failure  is  done  in  one  of  two  ways: 

•  By  direct  comparison  of  the  mutant  output  with  the  program's  output 

•  By  a  user  supplied  algorithm  which  examines  the  output  of  the  mutant. 

In  both  cases  the  system  asks  the  user  whether  or  not  the  program  is  acceptable  on  the 
test  data.  However,  determination  of  mutant  failure  is  done  by  the  system. 

L'pon  examining  the  report,  the  user  may  re-run  the  system  and  augment  his  test  data  ir. 
an  attempt  to  make  the  remaining  mutants  fail.  He  may  also  specify  that  additional 
mutant  operators  be  applied  to  the  program.  The  system  produces  another  report  of  the 
■>  a  me  nature  as  the  first  for  the  user  to  examine.  This  cycle  continues  until  the  user  is 
satisfied  that  his  current  test  data  adequately  tests  his  program  or  until  an  error  in 
the  program  is  discovered. 

The  prototype  FORTRAN  system  heavily  uses  the  PDP-10  file  system  to  record  transient 
information  such  as  mutant  correctness  status  and  the  current  test  data  between  runs. 

In  spite  of  the  fact  that  the  program  terminates  on  the  test  data,  some  mutants  may 
actually  be  non- termina t ing .  To  handle  this  the  system  records  the  program's  execution 
time  for  each  test  case  and  deems  that  a  mutant  has  failed  due  to  being  non- termina t i nu 
If  the  mutant  has  not  terminated  within  a  factor  of  the  program's  execution  time. 


115. 


17 


Experience  has  shown  that  a  factor  of  10  is  reasonable. 


SOME  INITIAL  EXPERIENCES  WITH  PROGRAM  MUTATION 

The  results  of  using  the  prototype  FORTRAN  mutation  system  on  three  programs  are  now 
described.  The  first  is  Hoare's  FIND  program  (014)  which,  given  an  integer  array  a, 
of  dimension  n  and  an  array  index  f,  rearranges  a  such  that  a  (1) -Mr-1)  are  no  greater 
than  a(f)  and  a(f*1)-a(v)  are  no  less  than  a(f).  The  second  is  the  Knuth,  Morris,  and 
Pratt  PAT  program  (015,016)  which,  given  two  arrays  of  integers,  decides  whether  the 
first  array  occurs  in  the  second.  The  third,  SCAN,  is  the  scanner  used  in  the  prototype 
system  itself. 

For  all  three  programs,  the  testing  strategy  was  the  following.  We  first  constructed 
what  we  believed  would  be  good  test  data  for  the  programs  independently  of  the  mutation 
system.  The  program  and  this  initial  test  data  were  input  to  the  system  with  all  imple¬ 
mented  mutant  operators  in  effect.  The  results  of  these  initial  runs  are  summarised 
in  Figure  1 . 


PROGRAM 

EXECUTABLE 

STATEMENTS 

NUMBER  OF 
TEST  CASES 

NUMBER  OF 
MUTANTS 

PERCENTAGE  OF  IN¬ 
CORRECT  MUTANTS 

FIND 

34 

24 

758 

92-2 

PAT 

42 

9 

1178 

77-2 

SCAN 

104 

19 

8838 

89  •  1 

Figure  1:  Initial  mutation  run 


We  then  made  mutation  runs  with  augmented  test  data  until  all  mutants  either  failed  or. 
some  test  case  or  were  determined  equivalent.  The  final  results  are  shown  in  Figure  2. 


PROGRAM 

NUMBER  OF  RUNS 

NUMBER  OF  TEST  CASES 

»  OF  INCORRECT  MUTANT! 

FIND 

8 

49 

98-1 

PAT 

9 

35 

98*7 

SCAN 

Figure  2 : 

7 

Initial  mutation  run 

35 

97-9 

In  comparing  the  mutant  elimination  percentages  of  Figure  1  to  Figure  2,  we  can  demons¬ 
trate  one  reason  why  program  testing  as  an  art  has  been  held  in  such  low  esteem: 

With  all  three  programs,  even  after  hard  thought,  our  initial  test  data  failed  to 
distinguish  a  large  number  of  incorrect  mutants. 

Although  the  initial  mutant  elimination  percentages  in  Figure  1  seem  adequate,  correl¬ 
ated  with  Figure  2,  we  see  that  the  initial  test  data  failed  to  distinguish  44  incorrect 
mutants  of  FIND,  253  incorrect  mutants  of  PAT,  and  778  incorrect  mutants  of  SCAN.  The 
final  mutant  report  for  SCAN  appears  in  the  appendix. 

The  reason  FIND  did  so  well  initially  is  due  to  our  chooslnq  all  permutations  of  1-4 
with  F  fixed  at  3  for  the  Initial  data.  Permutations  are  a  reasonable  test  of  programs 
like  FIND  but  they  fail  to  distinguish  all  mutants  {see  (013)).  Higher  dimensioned 
permutations  do  no  better.  We  have  run  lOOO  uniformly  drawn  random  permutations  of  sizes 


ue 


5  and  6  as  test  data  for  FIND  and  they  failed  to  distinguish  39  Incorrect  mutants  of 
FIND.  The  reason  is  permutations  of  a  are  legal  FORTRAN  indices  of  a  and  not  until 
negative  data  is  used  do  these  mutants  of  FIND  fail.  This  is  analogous  to  mixing 
pointers  and  the  values  pointed  at  which  is  a  common  progranmlng  blunder.  The  insight 
gained  is  that  test  data  for  pointer  type  programs  should  be  constructed  so  that  values 
pointed  at  are  not  legal  pointers.  See  (013)  for  other  such  insights  for  test  data 
selection  that  have  been  gained  from  program  mutation  as  well  as  for  a  more  detailed 
discussion  on  the  pitfalls  of  using  random  test  data. 

Figures  1  and  2  suggest  that  2%  might  be  a  good  estimate  for  the  expected  number  of 
equivalent  mutants  that  a  program  will  have,  at  least  for  the  mutant  operators  imple¬ 
mented  in  the  prototype  FORTRAN  mutation  system.  If  one  accepts  this  estimate,  then 
eliminating  better  than  (say)  97*  of  all  mutants  without  trying  to  determine  equivalent 
mutants  allows  one  to  gain  high  confidence  in  test  data  adequacy. 

Another  observation  is  that  the  number  of  mutants  of  a  program  appears  bounded  by  cn 
where  n  is  the  number  of  statements  in  the  program  and  c<l.  This  compares  favourably 
with  other  methodologies  for  achieving  reliable  software  which  all  seem  to  have  inher¬ 
ent  exponential  growth  factors.  In  fact,  the  unclever  prototype  FORTRAN  mutation  system 
took  90  minutes  of  CPU  time  on  the  PDP-10  KA-10  processor  to  run  the  8838  mutants  of 
the  104  statement  scanner  program.  The  KA-10  is  5  times  slower  than  the  IBM  370/158 
and  30  times  slower  than  the  CDC  7600.  Because  our  system  is  CPU  bound,  the  90  minutes 
of  CPU  time  scales  down  directly  to  18  and  3  minutes  on  these  faster  machines. 

We  are  currently  working  on  a  300-line  auditing  program  taken  from  a  production  environ¬ 
ment.  We  see  no  reason  why  any  FORTRAN  module  cannot  be  tested  on  a  mutation  system 
within  acceptable  cost-effective  CPU  times. 


CONCLUSIONS 

Program  mutation  is  an  engineering  approach  to  program  testing  where  the  goal  is  to 
establish  that  a  program  is  either  correct  or  is  radically  incorrect.  The  method  is 
based  on  the  coupling  effect:  simple  mutations  are  sufficient  to  distinguish  complex 
mutations.  Initial  experience  has  suggested  the  validity  of  the  coupling  effect. 

The  effectiveness  of  program  mutation  depends  on  two  factors: 

•  Human  judgement 

•  The  Implemented  mutant  operators. 

In  the  former  case  we  have  suggested  how  mutation  systems  can  be  designed  to  aid  users 
in  meeting  the  goals  of  program  mutation.  In  the  latter  case  we  are  currently  running 
experiments  to  evaluate  the  mutant  operators  implemented  in  the  prototype  system  and 
to  develop  'good'  mutant  operators  for  future  mutation  systems. 


19 


REFERENCES 


001  RAMAMOORTHY  C  V,  HO  S  F  and 
CHEN  W  T 

On  the  iu tommted  generation  of 
program  test  data 
IEEE  Trans  on  Software  Eng  vol  2 
no  1  pp  293-300  (Dec  1976) 


008  KING  J 

Symbolic  execution  and  program  testing 
CACM  vol  19  no  7  pp  385-394 
(July  1976) 


009  LONDON  R 

The  current  state  of  proving  programs 
002  HOWDEN  W  E  correct 

Methodology  for  the  generation  Proc  ACM  Nat  Conf  New  York  (1972) 

of  program  test  data 
'  IEEE  Trans  on  Computers  vol  24 
no  3  pp  554-560  (May  1975) 


003  HOWDEN  W  E 

Re  1 i ab i 1 i t  y  of  the  path  analysis 
testing  strategy 

IEEE  Trans  on  Software  Eng  vol  2 
no  3  pp  208-214  (Sept  1976) 


004  GOODENOUGH  J  B  and  GERHART  S  L 
Towards  a  theory  of  test  data 
selection 

IEEE  Trans  on  Software  Eng  vol  1 
no  2  pp  156-173  (June  1975) 


005  HUANG  J  C 

An  approach  to  program  testing 
Comp  Surv  vol  7  no  3  pp  113-128 
(Sept  1975) 


006  MILLER  E  F  and  MELTON  R  A 

Automated  generation  of  test 
case  datasets 

Proc  1st  Inti  Conf  on  R enable 
eoftwa re 


00  7  CLARKE  L 

A  «</*(•■  to  generete  test  data 
end  symbol  icei  1  y  ejrecut*  programs 
IEEE  Trans  on  Software  Eng  vol  2 
no  3  pp  215-222  (Sept  1976) 


010  HANTZER  S  and  KING  J 

An  introduction  to  proving  the  correct - 

ness  of  programs 

Comp  Surv  vol  8  no  3  pp  331-353 

(Sept  1976) 


Oil  YOUNGS  E  A 

Human  errors  in  programming 

Inti  J  of  Man  Machine  Studies  no  6 

pp  361-376  (1974) 


01 2  BOEHM  B 

Software  design  and  structuring 
In  Practical  strategies  tor  developing 
large  software  systems  Horowitz  (ed) 
Addi son-Wes 1 ey  (1975) 


013  DeMILLO  R,  LIPTON  R  and  SAYWARD  F 
Hints  on  test  data  selection 
Computer  (April  1978) 


0 1 4  HOARE  C 

Algorithm  65:  FIND 

CACM  vol  4  no  1  p  321 
(April  1961) 


015  MORRIS  J  and  PRATT  V 

A  linear  pattern  matching  elgonthm 
Tech  rep  40  Comp  Centre  Univ  of  California 


118 


20 


02  3  DeMILLO  R,  LIPTON  R  and  SAYVfARD  F 

Program  mutation  as  a  too 1  for  managing 
large  scale  software  development 
ASQC  Tech  Conf  (1978) 


017  BUDD  T  and  LIPTON  R 

Mutation  analysis  of  decision 
table  programs 

Conf  on  Information  sciences  and 
systems  John  Hopkins  Unlv  (1978) 


018  BUDD  T,  DeMILLO  R,  LIPTON  R 
and  SAYWARD  F 

.  The  design  of  a  prototype  mutat¬ 
ion  system  for  program  testing 
NCC  (1978) 


019  DeMILLO  R,  LIPTON  R  and  PERLIS  A 
Social  processes  and  proofs  of 
theorems  and  programs 
4th  ACM  Symp  on  Principles  of 
programming  languages  (1977) 


020  GOODENOUGH  J 

A  survey  of  program  testing 
i ss ues 

In  Research  directions  in  soft¬ 
ware  technology  P  Wegner  (ed) 
MIT  Press  (1978) 


021  CHANG  H  Y  et  al 

Fault  diagnosis  of  digital 
sy st ems 

Wi ley-Interscience  (1970) 


022  BOEHM  B  W 

Software  engineering 

IEEE  Trans  on  Computers  vol  25 

no  12  pp  1226-1241  (Dec  1976) 


Berkeley  (1970) 


016  KNUTH  D  and  PRATT  V 

Automata  theory  can  he  useful 
Stanford  Univ  Tech  rep  (1971) 


119 


21 


appendix 


MUTATION  STATUS  REPORT  SCAN . F4 


2  3-Oc t- 7  7 


LISTING 

OF 

THE  PROGRAM  BEING  MUTATED 

- 

1 

SUBROUTINE  SCAN ( RE  CORD  ,N  .  T YPE . X INO  ,  I D ,E0l , I V AL .COLUMN) 

2 

C 

3 

C 

1  . 

SCAN  THE  RECORD,  STARTING  AT  LOCATION  N  AND  RETURN  THE  1 

4 

c 

IDENTIFIER,  CONSTANT  OR  SYMBOL. 

5 

c 

6 

c 

2. 

RECORD  IS  AN  80  WORD  ARRAY,  ASSUMED  1  CHAR  TO  A  WORD.  1 

w 

7 

c 

LOCATION  TO  START  SCAN.  ON  RETURN  N  POINTS  TO  LAST 

8 

c 

SCANNED  +  1  . 

'  9 

c 

t- 

10 

c 

3. 

RETURNS  INFORMATION  IN  COMMON/SCANNER/ 

1 1 

c 

TYPE  -  TYPE  OF  OBJECT  FOUND  (SEE  SCANE  R . PAR ) 

12 

c 

KIND  -  SUBCLASS  OF  TYPE 

W 

1  3 

c 

10(2)-  CHARACTER  FORM  OF  IDENTIFIER  FOUND,  PADDED  1 

14 

c 

BLANKS 

15 

c 

EOL  -  SET  TRUE  IF  END  OF  LINE  WAS  FOUND  BEFORE  A  I 

16 

c 

WAS 

17 

c 

I V AL  -  INTEGER  YAlUE  FOUND 

18 

c 

COLUMN  -  COLUMN  IN  WHICH  LOCATED  OBJECT  BEGAN 

Ww 

19 

c 

20 

c 

INTERNAL  VARIABLES  USED 

21 

c 

CH  -  CURRENT  CHARACTER 

22 

c 

K  -  LOOP  COUNTER 

23 

c 

IDB  -  UNPACKED  ID  BUFFER 

24 

c 

LOG  -  STHBOUC  LOGICAL  OPERATORS 

25 

c 

LTYPE-  TYPE  FOR  EACH  OF  THE  ABOVE 

w 

26 

c 

LKIND-  KIND  FOR  EACH  OF  THE  ABOVE 

27 

c 

SYMC  -  CHARACTER  CODE  FOR  SYMBOLS  TO  BE  RECOGNIZED 

28 

c 

STYPE-  TYPE  FOR  EACH  OF  THE  ABOVE 

w 

29 

c 

SKIND-  KIND  FOR  EACH  OF  THE  ABOVE 

30 

c 

31 

c 

4. 

NONE 

W. 

32 

c 

33 

c 

5. 

NONE 

34 

c 

... 

35 

c 

6. 

TRAPON  TO  TRAP  INTEGER  OVERFLOWS. 

36 

c 

37 

c 

7. 

ENCODE  TO  PACK  IDENTIFIER 

w 

38 

c 

LINE  32-  ASSUMES  CHAR  NUMBER  •  CHAR/2»*29 

39 

c 

40 

c 

8. 

ASSUMES  5  CHARACTERS  TO  A  WORD 

41 

c 

42 

c 

9. 

NONE 

43 

c 

44 

c 

10 

.  TIM  BUDD,  JULY  29,  1977 

l- 

45 

c 

46 

c 

11 

47 

c 

w> 

48 

c 

12 

—  -  ^  ■.  * 


120 


22 


49  C 

50  C 

51  c  INCLUDE  ' SCANE  R . PAR/NOL I  ST  * 

[  *««•  tested  by  manually  changing  parameters  in  body  OF  CODE 

53  C 

55  c  PARAMETER  SPACE-1H.  A = I  HA . 2  .  1  HZ , CO  =  1  HD  .  C 9 = 1 H9  .  PE R I  00  =  1 H .  , 

56  C  *  STARCH* 

67  c  ....  ALS0  TESTED  AS  ABOVE 

58  C 

59  C 

INTEGER  N  .  RE  CORD ( 12) 


61  C 

6  2  c  INCLUDE  'SCANER.COM/N0LIST' 

6j  c  ....  TESTED  BY  MAKING  INTO  PARAMETERS 
f4  INTEGER  TYPE .KIND, I0( 2) ,EOL . 1VAL .COLUMN 

65  C 

66  C 

67  C 

6S  INTEGER  CH.K . I  OB ( 10) ,SYMC(8) . 

69  *  STYPE(8) , SK I N  D ( 8 ) 

71  '  C 

7,  c  ....  DATA  STATEMENTS  SIMULATED  BY  ASSIGNMENTS 

72  C  DATA  SYMC/'C.  •l'.'.  '/•.••"/.  STYPE/  LPARN.RPARN 

73  t  .  COMMA. BECOMS.2-AOOOP.MULOP.APOST  /.SKIND/  4*N0K I ND .  PL  US  . 

?4  c  *  MINUS, DIVIDE .NOKINO/ 


75 

SYMC{1)  « 

40 

76 

SYMC ( 2 )  = 

41 

77 

SYMC ( 3 }  = 

44 

78 

SYMC ( 4  )  = 

61 

79 

SYMC ( 5  1  - 

43 

if  * 

SYMC { 6  )  = 

45 

SI 

SYMC ( 7 )  = 

47 

82 

SYMC ( 8  )  - 

96 

S3 

STYPE  ( 1  ) 

=  4 

84 

STYPE ( 2 ) 

=  9 

£5 

STYPE ( 3 ) 

=  8 

86 

STYPE  { 4  ) 

=  10 

£7 

STYPE  |  5 ) 

=  5 

88 

STYPE (6 ) 

=  5 

89 

S T  YPE ( 7  S 

=  6 

9C 

STYPE (8) 

=  15 

91 

SK  I  N  D ( 1  ) 

=  1 

92 

SK IND ( ? ) 

*  I 

93 

SK  I N  D { 3) 

=  1 

94 

SK I ND ( 4 ) 

1 

95 

SK  I NO ( 5 ) 

-  2 

96 

S  K  I  N  D  (  6  ) 

3 

97 

S  K I N  D ( 7) 

=  5 

98 

SK I N  D ( 8 ) 

--  1 

99  C 

)C0  c  -  -  - 

101 

TYPE  -  1 

1  0  2 

KIND  =  1 

1?1 


23 


103 

10(1)  *  32 

104 

I 0(2 )  -  32 

105 

EOL  •  0 

106 

IVAL  ■  0 

107  C 

Lm 

108  C  SKIP  OVER  LEADING  SPACES 

109  C 

, 

110  10 

IF  (N.GE.13)  GOTO  110 

111 

IF  ( RECORD (N ) .NE . 32)  GOTO  20 

112 

N  -  N  +  1 

113 

GOTO  10 

k. 

114  C 

115  C  NOW 

HAVE  NON-NULL  CHAR 

.116  C 

w 

117  20 

CH  .  RECORD (N  ) 

118 

COLUMN  •  N 

> 

119  C 

L 

l  ?o  r  Bf  fin  fiN  f  nrwT iftcb - - - - -  _  _ 

121  C 

122 

IF  ((CH.LT.65  ) . OR . (CH . GT . 90  ))  GOTO  30 

L* 

*  123 

K  -  0 

124  22 

IF  (K.GE.10)  GOTO  23 

125 

K  =  K  +  1 

126 

IDB(K)  =  CH 

1  27  23 

N  -  N  +  1 

128 

IF  (N.LT.13)  GOTO  24 

129 

CH  »  64 

130 

GOTO  25 

131  24 

CH  «  RECORQ(N) 

1  32  25 

IF  (((CH.GE.64  ) . AND. ( CH . LE . 90  ) ) . OR . 

133  * 

((CH.GE.48  ) . AND. (CH . LE . 57  ))) 

1 34  * 

GOTO  22 

135  C  PAD  WITH  BLANKS 

k> 

1  36  26 

IF  (K.GE.10)  GOTO  28 

137 

K  *  K  +  1 

• 

138 

IDB(K)  *  64 

w 

139 

GOTO  26 

140  C  28 

ENCODE ( 10,999 , ID)  IDB 

141  28 

DO  29  K»1  ,5 

142 

10(1)  =  10(1)  *  10  +  IDB(K) 

1  43  29 

10(2)  .  10(2)  *  10  +  I DB ( K+5 ) 

144 

TYPE  -  2 

14S 

GOTO  90 

146  C 

147  C  READ 

A  NUMBER . - . . . 

148  C 

149  30 

IF  ((CH.LT.48  ) . OR. (CH .GT . 57  ))  GOTO  40 

1  50  32 

IVAL  «  IVAL  *  10  ♦  (CH  -  48  ) 

151 

N  •  N  ♦  1 

k. 

152 

IF  (N.EQ.13)  GOTO  34 

153 

CH  •  RECORD(N) 

154 

IF  ((CH.GE.48  (.AND. (CH.LE.57  ))  GOTO  32 

1  55  34 

TYPE  •  3 

156 

GOTO  90 

in 


24 


157  C 

158  C 

READ 

A  PERIOD  (OR  A  LOGICAL  EXPR) . 

159  C 

160 

40 

IF  (CN.NE.46  )  GOTO  50 

161 

TYPE  •  16 

162 

GOTO  90 

163  C 

READ 

164  C 

A  S  T  A  P 

165  C 

166 

50 

IF  (CH  ,N£ .42  )  GOTO  60 

167 

TYPE  ■  6 

168 

KIND  •  4 

169 

N  -  N  +  1 

1  70 

IF  (N.EQ.  13)  GOTO  90 

171 

CH  -  RECORD(N) 

172 

IF  (CH.NE.42  )  GOTO  90 

173 

N  -  N  +  1 

174 

TYPE  =  7 

175 

KIND  =  1 

176 

177  C 

178'  C 

READ 

GOTO  90 

().*♦*/'  . 

179  C 

180 

60 

DO  62  K-l .8 

181 

62 

IF  (CH.EQ.SYMC(K) )  GOTO  64 

182  C 

FALL 

THROUGH  LOOP  *>  SYMBOL  ERROR 

183 

GOTO  120 

184 

64 

TYPE  =  STYPE(K) 

185 

KIND  =  SKIND(K) 

186 

N  «  N  +  1 

187  C 

GOTO  90 

188  C 

189  C 

CORRECT  EXIT  POINT 

190  C 

191 

90 

RETURN 

192  C 

193  C 

194  C 

ERRORS- 

195  C 

ERROR  110.  END  OF  LINE 

196 

110 

EOL  *  1 

197 

COLUMN  «  13 

198 

GOTO  90 

199  C 

200  C 

ERROR  120.  INVALID  SEQUENCE  OF  SYMBOLS 

201 

1  20 

GOTO  90 

202 

END 

203 

123 


25 


CLASSIFICATION  OF  THE  PROGRAM'S  FORMAL  PARAMETERS 
STRICTLY  OUTPUT  PARAMETERS 

TYPE  KIND  ID  EOL  I YAL  COLUMN 

INPUT  AND  OUTPUT  PARAMETERS 
N 

READ  ONLY  INPUT  PARAMETERS 
RECORD 


THE  METHOD  OF  DETERMINING  MUTANT  CORRECTNESS  IS 
BY  COMPARISON  TO  THE  PROGRAM 


26 


DeM i 1 1 o  ct  a  1 


THE  PIMS  RUN  TITLE 

BEFORE  THIS  RUN  THERE  WERE  6  P IMS  RUNS  ON  THIS  PROGRAM 
8838  MUTANTS  WERE  CREATED  DURING  THOSE  RUNS 


0  NEW  MUTANTS  WERE  CREATED  DURING  THIS  RUN 


FOR  A  GRAND  TOTAL  OF  8838  MUTANTS 


MUTANT'S  STATUS  BEFORE  THIS  RUN 


27 


A  TOTAL  OF 
A  TOTAL  OF 
OF  THESE 


29  TEST  CASES 
8838  MUTANTS 
206  ARE  STILL  ALIVE 


THERE  ARE  104  PROGRAM  STATEMENTS, 

GIVING  84-98  MUTANTS  PER  STATEMENT 


MUTANT  PROFILE 


ARRAY  LIMIT  DEFAULT  INSERTION 

6 

0 

SCALAR  VARIABLE  REPLACEMENT 

539 

2 

SCALAR  VAR  FOR  CONSTANT  REPLMT 

872 

34 

CONSTANT  FOR  SCALAR  VAR  REPLMT 

1320 

0 

COMPARABLE  ARRAY  NAME  REPLMT 

210 

0 

CONST  FOR  ARRAY  REF  REPLACEMENT 

360 

0 

SCALAR  VAR  FOR  ARR  REF  REPLMT 

336 

0 

ARRAY  REF  FOR  CONST  REPLACEMNT 

2368 

1  1  1 

ARR  REF  FOR  SCALAR  VAR  REPLMT 

2016 

12 

ARITHMETIC  OPERATOR  REPLACEMNT 

64 

0 

RELATIONAL  OPERATOR  REPLACEMNT 

105 

15 

LOGICAL  CONNECTOR  REPLACEMENT 

6 

0 

GOTO  LABEL  REPLACEMENT 

427 

21 

PATH  ANALYSIS 

104 

0 

CONTINUE  STATEMENT  INSERTION 

2 

2 

RETURN  STATEMENT  INSERTION 

103 

9 

THE  PERCENTAGE  OF  ELIMINATED  MUTANTS 

IS 

97-67 

THE  ELIMINATION  PROFILE  FOR  ALL  MUTANTS 

IS 

TYPE  OF  ELIMINATION  NUMBER 

OF 

ELIMINATED 

MUTANTS 

TIMED-OUT 

310 

REFERENCED  AN  UNDEFINED  VARIABLE 

2034 

SUBSCRIPT  RANGE  ERROR 

1071 

DIVIDED  BY  ZERO 

0 

ARITHMETIC  OVERFLOW  OR  UNDERFLOW 

63 

WROTE  A  READ  ONLY  VARIABLE 

58 

EXECUTED  A  TRAP  STATEMENT 

1  04 

PROOUCED  WRONG  ANSWERS 

4992 

mutant  status  after  this  run 

A  TOTAL  OF  35  TEST  CASES 

A  TOTAL  OF  8838  MUTANTS 

OF  THESE  190  ARE  STILL  ALIVE 

THERE  ARE  104  PROGRAM  STATEMENTS, 

GIVING  84-98  MUTANTS  PER  STATEMENT 

MUTANT  PROFILE 

ARRAY  LIMIT  DEFAULT  INSERTION 
SCALAR  VARIABLE  REPLACEMENT 
SCALAR  VAR  FOR  CONSTANT  REPLMT 
CONSTANT  FOR  SCALAR  VAR  REPLMT 
COMPARABLE  ARRAY  NAME  REPLMT 
CONST  FOR  ARRAY  REF  RE  PL ACEMNT 
SCALAR  VAR  FOR  ARR  REF  REPLMT 
ARRAY  REF  FOR  CONST  REPLACEMNT 
ARR  REF  FOR  SCALAR  VAR  REPLMT 
ARITHMETIC  OPERATOR  REPLACEMNT 
RELATIONAL  OPERATOR  REPLACEMNT 
LOGICAL  CONNECTOR  REPLACEMENT 
GOTO  LABFL  REPLACEMENT 
PATH  ANALYSIS 

CONTINUE  STATEMENT  INSERTION 
RETURN  STATEMENT  INSERTION 

THE  PERCENTAGE  OF  ELIMINATED  MUTANTS  IS 

THE  ELIMINATION  PROFILE  FOR  ALL  MUTANTS 

TYPE  OF  ELIMINATION  NUMBER  OF 

TIMED-OUT 

REFERENCED  AN  UNDEFINED  VARIABLE 
SUBSCRIPT  RANGE  ERROR 
DIVIDED  BY  ZERO 

ARITHMETIC  OVERFLOW  OR  UNDERFLOW 
WROTE  A  READ  ONLY  VARIABLE 
EXECUTED  A  TRAP  STATEMENT 
PRODUCED  WRONG  ANSWERS 


DoMi 1 1 O  • t  Ml 

28 


6 

0 

539 

2 

872 

33 

1  320 

0 

210 

0 

360 

0 

336 

0 

2368 

1 1  1 

2016 

12 

64 

0 

105 

6 

6 

0 

427 

1  5 

104 

0 

2 

2 

103 

97-85 

9 

is 

ELIMINATED  MUTANTS 
310 
2035 
1073 
0 
63 
58 
104 
5005 


29 


Mutation  Analysis 

Timothy  A,  Budd,  Richard  J.  Lipton, 
Richard  A.  DeMillo,  and  Frederick  G.  Sayward 

Research  Report  #155 


April  1979 


Supported  in  part  by  the  Office  of  Naval  Research  under  Grant 
N00014-75-C-0752,  the  Army  Research  Office  under  Grant 
DAAG  29-78-G-0121,  and  the  National  Science  Foundation  under 
Grant  MCS-780-7291. 


30 


Mutation  Analysis 

Timothy  A.  Budd 
Richard  J.  Lipton 

Computer  Science  Division 
University  of  California, 

Berkeley,  CA  94720 

Richard  A.  DeMillo 

School  of  Information  and  Computer  Science 
Georgia  Institute  of  Technology 
Atlanta,  Georgia  30332 

Frederick  G.  Sayward 

Computer  Science  Department 
Yale  University 
New  Haven,  CT  08520 


ABSTRACT 

A  New  type  of  software  test  is  introduced,  called  mutation 
analysis.  A  method  for  applying  mutation  analysis  is  described, 
and  the  results  of  several  experiments  to  determine  its 
effectiveness  are  given.  Finally  it  is  shown  bow  mutation  analysis 
can  subsume  or  augment  many  of  the  more  traditional  program 
testing  techniques. 


1.  Introduction 

Traditionally,  program  testing  has  been  an  ad  hoc  technique  done 
by  all  programmers:  the  programmer  creates  test  data  which  he  intui¬ 
tively  feels  captures  the  salient  features  of  the  program,  observes  the 
program  in  execution  on  the  data,  and  if  the  program  works  on  the 
data  (i.e.,  passes  his  test)  he  then  concludes  the  program  is  correct. 
Just  as  most  programmers  have  tested  programs  in  this  manner,  most 
programmers  have  also  deemed  to  be  correct  programs  which  were 
indeed  incorrect. 

Modern  testing  techniques  attempt  to  augment  the  programmer’s 
intuition  by  providing  quantitative  information  on  how  well  a  program 
is  being  tested  by  the  given  test  data.  Certainly  the  sheer  number  of 
test  cases  is  not  sufficient  to  significantly  increase  our  confidence  in 
the  correct  functioning  of  a  program.  If  all  the  test  cases  exercise  the 
program  in  roughly  the  same  way  then  nothing  has  been  gained  over  a 
smaller  number  of  executions.  The  key  idea  of  modern  testing  tech¬ 
niques  is  to  exercise  the  program  under  a  variety  of  different  cir¬ 
cumstances,  thereby  giving  the  programmer  a  greater  confidence  in 
the  correct  functioning  of  the  software  component. 


Several  popular  testing  techniques  use  an  idea  called  covering 
measure  Examples  of  covering  measures  are:  the  number  of  state¬ 
ments  executed,  number  of  branch  outcomes  taken,  or  the  number  of 
paths  traversed  by  the  test  cases.  Test  data  with  high  coverage  meas¬ 
ures  then  exercise  the  program  more  throughly  (according  the  the  cri¬ 
terion)  then  ones  with  low  measure. 

In  this  paper  we  will  discuss  a  new  type  of  testing  method,  pro¬ 
gram  mutation,  which  differs  significantly  from  those  previously  men¬ 
tioned.  Numerous  theoretical  and  empirical  studies  [1,2,4, 5]  indicate 
that  data  satisfying  this  test  criterion  often  perform  significantly 
better  in  discovering  errors  and  validating  programs  then  data  satisfy¬ 
ing  .other  criterion.  In  many  cases,  the  new  test  will  actually  subsume 
the  goals  which  have  been  earlier  investigated. 

2.  Description  of  the  Method 

Mutation  analysis  starts  with  one  important  assumption  which  is 
surprisingly  not  often  recognized: 

experienced  programmers  write  programs  which  are  either 

correct  or  are  almost  correct. 

(one  manifestation  of  this  is  the  common  programmers  joke  that  the 
code  is  always  "90%"  finished.) 

The  mutation  method  can  be  explained  as  follows:  Given  a  program 
P  which  performs  correctly  on  some  test  data  T,  subject  the  program 
to  a  series  of  mutant  operators,  thereby  producing  mutant  programs 
which  differ  from  P  in  very  simple  ways.  For  example,  if 

1  =  1+1 

is  a  statement  in  P,  then 

1  =  1-1 
1  =  1+2 
I  =  J  +  1 

are  all  simple  changes  which  lead  to  three  mutants  of  P. 

The  mutant  programs  are  then  executed  on  T.  If  each  mutant  pro¬ 
gram  produces  an  answer  which  differs  from  the  original  on  at  least 
one  test  case,  then  the  mutation  test  for  P  is  passed.  If,  as  is  more 
likely,  some  of  the  mutants  produce  the  same  answers  as  the  original 
program  on  all  the  test  cases  submitted,  then  either 

1)  the  mutant  programs  are  equivalent  to  P 

2)  the  test  data  T  is  inadequate  for  passing  the  mutation  test  and 

must  be  augmented. 

In  this  case  the  original  program  must  then  be  examined  with  the 
list  of  live  mutants  in  order  to  derive  test  data  on  which  some  or  all  of 
the  remaining  mutants  will  fail.  The  degree  of  testing  is  then  measured 
in  terms  of  the  number  (or  percentage)  of  mutants  which  have  been 
eliminated  by  the  test  data. 

As  an  intuitive  aid  one  can  think  of  the  mutation  system  as  pro¬ 
posing  alternatives  to  the  given  program  and  asking  the  programmer 
for  reasons,  in  the  form  of  test  cases,  as  to  why  the  alteratives  are  not 
just  as  effective  as  the  original  program  in  solving  the  given  task.  This 
then  insures  that  the  program  is  correct  relative  to  small  perturba¬ 
tions  in  its  structure. 


32 


At  first  glance,  however,  it  would  appear  that  a  program  and  test 
data  which  passed  this  test  might  still  contain  some  complex  errors 
which  are  not  explicitly  mutations  of  H.  To  this  end  there  is  a  coupling 
effect  which  states: 

test  data  on  which  all  simple  mutants  fail  is  so  sensitive  to 
changes  in  the  program,  that  it  is  highly  likely  that  all  complex 
mutants  must  also  fail 

By  complex  mutant  we  mean  the  transformation  which  takes  the 
original  incorrect  program  into  the  presumed  correct  version.  Since 
therefore  any  such  correct  program  will  be  differentiated  from  P,  if  P 
truly  executed  correctly  on  T  there  can  be  no  complex  mutants,  hence 
P  is  correct. 

Several  experiments  substantiating  the  coupling  effect  have  been 
conducted[l,4].  Some  of  these  will  be  described  in  the  following  sec¬ 
tions.  The  DAVE  group  [15,16]  at  the  university  of  Colorado  have  also 
observed  that  the  ability  to  detect  simple  errors  is  often  useful  in 
insuring  against  quite  complex  errors.  The  types  of  simple  errors  con¬ 
sidered  in  mutation  analysis  is,  however,  much  more  extensive  then 
that  considered  by  DAVE. 

Constant  Replacement  (±  1) 

Scalar  for  Constant  Replacement 

Source  Constant  Replacement 

Array  Reference  for  Constant  Replacement 

Scalar  Variable  Replacement 

Constant  for  Scalar  Replacement 

Array  Reference  for  Scalar  Replacement 

Comparable  Array  Name  Replacement 

Constant  for  Array  Reference  Replacement 

Scalar  for  Array  Reference  Replacement 

Array  Reference  for  Array  Reference  Replacement 

Arithmetic  Operator  Replacement 

Relational  Operator  Replacement 

Logical  Connector  Replacement 

Unary  Operator  Removal 

Unary  Operator  Replacement 

Unary  Operator  Insertion 

Statement  Analysis 

Statement  Deletion 

Return  Statement  Replacement 

Goto  Statement  Replacement 

Do  Statement  Replacement 

figure  1 


3.  The  System 

A  system  has  been  constructed  which  performs  mutation  analysis 
on  sets  of  subroutines  written  in  ANSI  FORTRAN.  The  system  is  interac¬ 
tive  and  iterative,  so  that  the  user  presents  the  system  with  a  program 
and  an  initial  test  set.  After  constructing  and  executing  each  mutant 
serially  the  system  responds  with  summaries  and  reports  on  the 
number  and  type  of  mutants  which  remain  (i.e.  which  produced  the 
same  result  as  the  orig;nal  program.)  The  user  can  then  augment  the 


33 


test  data  set  and  reexecute  the  remaining  mutants  on  the  new  test 
cases.  This  process  can  continue  until  the  desired  level  of  testing  is 
attained. 

The  mutant  operators  used  m  the  current  system  are  shown  in 
figure  1.  The  names  are  fairly  self  explanatory;  for  example,  the  three 
mutations  given  in  section  2  are  produced  by  arithmetic  operator 
replacement,  constant  replacement,  and  scalar  variable  replacement, 
respectively. 

Various  versions  of  the  mutation  system  have  been  in  operation 
for  about  two  years  [2],  and  in  that  period  numerous  experiments  have 
been  conducted  investigating  the  coupling  effect  and  the  utility  of  the 
tool  for  program  development  and  testing  [5],  The  next  section  details 
some  experiments  performed  which  substantiate  the  coupling  effect. 

4.  The  Coupling  Effect 

We  have  already  reported  on  an  experiment  [4]  involving  Hoare’s 
FIND  program  [9]  that  supplied  empirical  evidence  for  the  coupling 
effect.  The  experiment  went  as  follows: 

‘  (l)  We  derived  a  test  data  set  T  of  49  cases  to  pass  the  mutation  test. 
(The  large  size  of  T  was  due  to  our  inexperience.) 

(2)  For  efficiency  reasons,  we  reduced  T  heuristically  to  a  test  data 
set  T’  consisting  of  seven  cases  on  which  FIND  also  passed  the 
mutant  test. 

(3)  Random  k-order  mutants  of  FIND,  k>l,  were  generateu.  (A  k-order 
mutant  comes  from  k  applications  of  mutant  operators  on  the 
program  P.) 

(4)  The  k-order  mutants  of  FIND  were  then  executed  on  T’. 

The  coupling  effect  says  that  the  non-equivalent  k-order  mutants  of 
FIND  will  fail  on  T'.  Note  that  step  2  biases  the  experiment  against  the 
coupling  effect  since  it  removes  the  man-machine  orientation  of  our 
approach  to  testing.  We  would  have  been  quite  happy  to  find  a  coun¬ 
terexample  to  the  coupling  effect  for  the  mutation  system,  since  it 
would  have  allowed  us  to  improve  the  set  of  mutant  operators.  The 
results  of  the  experiment,  though,  gave  evidence  that  we  had  chosen  a 
well  coupled  set  of  mutant  operators  for  the  pilot  system: 

K  Number  of  k-  order  mutants  Number  successful  on  T 
2  21100  19 

>2  1500  0 

The  19  successful  mutants  were  shown  to  be  equivalent  to  FIND.  We 
concentrated  on  the  k=2  case  since,  intuitively,  the  more  one  mutates 
FIND  the  more  likely  one  is  to  get  a  program  that  violates  the  com¬ 
petent  programmer  assumption. 

The  major  criticism  of  the  experiment  concerns  step  3.  Since  the 
first-order  mutants  that  compose  the  k-order  mutants  are  indepen¬ 
dently  drawn,  the  resulting  k-order  mutant  is  likely  to  be  very  unstable 
and  subject  to  quick  failure,  in  contrast  to  the  more  desirable  case 
where  the  k-order  mutant  contains  subtly  related  changes  that 
correspond  to  the  subtle  errors  programmers  find  so  hard  to  detect. 

The  current  experiment  on  the  coupling  effect  omits  step  2  above 
and  make  the  following  important  change  to  step  3: 


34 


(3)  Randomly  generate  correlated,  k-order  mutants  of  the  program.  By 
correlated  we  mean  that  each  of  the  k  applications  of  mutant 
operators  will  in  some  way  be  related  to  all  of  the  others  -  thev 
could  for  instance  effect  the  same  statement  of  P.  or  the  same 
variable  name,  or  the  same  statement  label,  or  the  same  constant. 

Once  again,  if  P  passes  the  mutant  test  with  test  data  T,  the  coupling 
effect  says  that  the  correlated  k-order  mutants  of  P  will  fail  on  T. 

For  this  experiment  three  programs  are  being  used:  FIND,  STKS1M 
and  TR1ANG.  STKS1M  is  a  program  that  maintains  a  stack  and  allows 
the  standard  operations  of  clear,  push,  pop,  and  top.  TRIANG  is  a  pro 
gram  that,  given  the  lengths  of  the  three  legs  of  a  triangle,  categorizes 
the  input  as  not  representing  a  triangle  or  as  representing  a  scalene, 
isoceles  or  equilateral  triangle  [3],  The  following  is  a  summary  of  the 
results  of  the  experiment  so  far- 

PROGRAM  K=2  K=3  K  =  4 

number  successes  number  successes  number  successes 
FIND  3000  2  3000  0  3000  0 

STKS1M  3000  3  3000  0  3000  0 

TRIANG  3000  1  3000  ,1  3000  0 

In  all  cases,  the  successfm  correlated  k-order  mutants  have  been 
shown  to  be  equivalent  to  the  original  program. 

We  have  yet  to  find  a  non-trivia!  counterexample  to  the  coupling 
effect  for  our  FORTRAN  systems.  The  one  successful  3-order  mutant  of 
TRIANG  deserves  closer  examination;  indeed,  we  initially  felt  that  it  was 
a  non-equivalent  mutant.  The  mutant  is 


35 


SUBROUTINE  TRIANG(1,J,K, MATCH) 

C 

INTEGER  I.J.K.MATCH 
C 

C  MATCH  IS  OUTPUT  FROM  THE  ROUTINE 
C  IF  MATCH  =  1  THE  TRIANGLE  IS  SCALENE 

C  IF  MATCH  =  2  THE  TRIANGLE  IS  ISOSCELES 

C  IF  MATCH  =  3  THE  TRIANGLE  IS  EQUILATERAL 

C  IF  MATCH  =  4  IT  IS  NOT  A  TRIANGLE 

C 

IF  (I  .LE.  0  .OR.  J  .LE.  0  .OR.  K  .LE.  0)  GOTO  500 
MATCH  =  0 
IF  (I  .NE.  J)  GOTO  10 
MATCH  =  MATCH  +  1 
10  IF  (I  .NE.  K)  COTO  20 
MATCH  =  MATCH  +  2 

MO^. change  statement  to  MATCH  =  MATCH  +  K 

20  IF  (J  .NE.  K)  GOTO  30 
MATCH  =  MATCH  +  3 
30  IF  (MATCH  .NE.  0)  GOTO  100 
IF  (I+J  .LE.  K)  GOTO  500 
IF  (J+K  .LE.  I)  GOTO  500 
IF  (1+K  .LE.  J)  GOTO  500 
MATCH  =  1 
RETURN 

100  IF  (MATCH  .NE.  1)  GOTO  200 
IF  (I+J  .LE.  K)  GOTO  500 
110  MATCH  =2 
RETURN 

200  IF  (MATCH  .NE.  2)  GOTO  300 

M0Z:  change  statement  to  IF  (MATCH  .NE.  K) 

IF  (I+K  .LE.  J)  GOTO  500 
GOTO  110 

300  IF  (MATCH  .NE.  3)  GOTO  400 
IF  (J+K  .LE.  I)  GOTO  500 

MO3 :  change  statement  to  IF  (J+J  .LE.  1) 

GOTO  110 
400  MATCH  =  3 
RETURN 

500  MATCH  =  4 
RETURN- 
END 

Note  that  the  correlation  is  with  respect  to  the  variable  K.  The  mutant 
operators  MO  t  and  MOz  produce  incorrect  mutants  while  M0a  produces 
a  mutant  equivalent  to  TRIANG.  Yet  the  3-order  correlated  mutant  is 
equivalent  to  TRIANG. 

This  makes  a  beautiful  illustration  of  the  part  of  the  programming 


36 


process  that  program  mutation  is  trying  to  exploit.  Using  the  constant 
2  in  the  first  two  mutated  statements  is  an  arbitrary  but  coupled  deci¬ 
sion.  Indeed,  you  can  replace  both  instances  of  2  by  any  positive  con¬ 
stant  (or  any  variable  whose  value  doesn't  change  between  the  execu¬ 
tion  of  the  two  statements)  and  you  get  an  equivalent  program  — 
replace  only  one  instance  and  you  get  an  incorrect  program.  In  a 
sense,  the  constant  2  in  those  statements  is  what  would  be  called  in  the 
terminology  of  formal  logic  a  "bound  variable.' 

5.  An  Analysis  of  How  Mutation  Works 

In  this  section  we  will  go  through  a  detailed  analysis  concerning 
how  and  why  mutation  ana'ysi*  can  he  expected  to  uncover  errors 
under  a  wide  variety  of  situations. 

5.1.  Trivial  Errors 

If  one  of  the  mutants  considered  is  indeed  the  correct  program 
then  of  course  the  error  wiLi  be  discovered  wnen  an  at'empt  is  made  to 
eliminate  that  particular  mutant.  Alternatively  if  the  errors  in  the  ori¬ 
ginal  program  act  in  a  reasonably  independent  manner  and  each  error 
is  individually  captured  by  a  single  mutation  then  the  errors  will 
almost  certainly  be  detected. 

Given  the  vast  folklore  about  large  systems  failing  for  extremely 
trivial  reasons,  the  ability  to  detect  such  simple  errors  m  indeed  a 
good  starting  Diace.  However  many  errors  do  not  correspond  exactly  to 
the  generated  mutations,  and  multiple  errors  may  interact  in  subtle 
fashions.  This  being  the  case  we  must  demonstrate  that  mutation 
analysis  possess  many  more  powerful  capabilities. 

5.2.  Statement  Analysis 

Many  programming  errors  manifest  themselves  by  sections  of 
code  being  "dead",  that  is  unexecutable,  when  they  shouldn't  be.  Also 
many  bugs  are  of  such  a  serious  nature  that  any  data  which  executes 
the  particular  statement  in  error  will  cause  the  program  to  give 
incorrect  results.  These  errors  may  persist  for  weeks  or  even  years  ir 
the  error  occurs  in  a  rarely  executed  section  of  code. 

Accordingly  a  reasonable  first  goal  for  a  set  of  test  cases  is  that 
every  statement  in  the  program  is  to  be  executed  at  least  once  [  12] 

Various  authors  have  presented  methods  to  achieve  this  goal.  Usu¬ 
ally  these  methods  involve  the  insertion  of  counters  into  the  straight 
line  segments  of  code.  When  all  counters  register  non-zero  values  every 
statement  in  the  program  has  been  executed  at  least  once. 

In  Mutation  analysis  we  take  a  different  approach  to  the  same 
objective.  If  a  statement  is  never  executed  then  obviously  any  change 
we  produce  in  it  will  not  cause  the  altered  program  to  produce  test 
answers  differing  from  the  original.  However  as  a  means  of  directing 
the  programmers  attention  to  these  errors  in  a  more  direct  and  unam¬ 
biguous  fashion  a  simpler  approach  is  taken.  Among  the  mutations 
generated  are  ones  which  replace  the  first  statement  of  every  basic 
block  in  turn  with  a  call  on  a  special  routine  which  aborts  whenever  it 
is  executed.  Obviously  these  mutations  are  extremely  unstable,  since 
any  data  which  executes  the  replaced  statement  will  cause  the  mutant 
to  produce  an  incorrect  result,  and  hence  to  be  eliminated.  The 


37 


reverse,  however,  is  also  true.  That  is,  if  any  of  these  mutants  survive, 
then  the  statement  which  the  mutation  altered  has  never  been  exe¬ 
cuted.  Hence  an  accounting  of  the  survival  of  this  class  of  mutations 
gives  important  information  about  which  sections  of  code  have  and 
have  not  been  executed. 

Mutation  Analysis  goes  even  one  step  further.  Some  authors  have 
assumed  that  not  executing  a  statement  is  equivalent  to  deleting  it  [8]. 
This  is  certainly  not  true.  A  statement  can  be  executed  but  still  not 
serve  any  useful  purpose.  In  order  to  investigate  this  another  class  of 
mutants  generated  replaces  every  statement  with  a  CONTINUE  state¬ 
ment  (a  convenient  FORTRAN  NO-OP.)  The  survival  or  elimination  of 
these  mutations  gives  more  information  then  merely  whether  the 
statement  is  executed  or  not,  it  indicates  whether  or  not  the  state¬ 
ment  is  performing  anything  useful.  If  a  statement  can  be  replaced  by 
a  NO-OP  with  no  effect  then  at  best  it  indicates  a  waste  of  machine  time 
and  at  worst  it  is  probably  indicative  of  much  more  serious  errors. 

Merely  being  able  to  execute  every  statement  in  the  program  is  no 
guarantee  that  the  code  is  correct  [7,10].  Problems  such  as  coinciden¬ 
tal  correctness  or  predicate  errors  may  pass  undetected  even  if  the 
statement  in  error  is  executed  repeatedly.  In  subsequent  sections  ve 
will  show  how  mutation  analysis  deals  with  these  problems. 

5.3.  Branch  Analysis 

Some  authors  have  pointed  out  [12]  that  an  improvement  over 
statement  analysis  can  be  achieved  by  insuring  that  every  flowchart 
branch  is  executed  at  least  once.  For  example  the  following  program 
segment 

A; 

IF  (expression) 

THEN  B; 

C; 

has  the  flowchart  shown  in  figure  2. 


figure  2 


All  three  statements  A,B  and  C  can  be  executed  by  a  single  test 
case.  It  is  not  true,  however,  that  in  this  case  all  branches  have  been 
executed.  For  example  in  this  case  the  empty  else  clause  branch  (a) 
has  been  ignored. 


38 


We  can  state  the  requirement  that  every  branch  be  taken  in  an 
equivalent  manner  by  requiring  that  every  predicate  expression  must 
evaluate  both  TRUE  and  FALSE.  It  is  this  formalization  which  is  used  in 
mutation  analysts. 

Among  the  mutants  generated  are  ones  which  replace  each  rela¬ 
tional  expression  and  each  logical  expression  by  the  logical  constants 
TRUE  and  FALSE.  Of  course,  like  the  statement  analysis  mutation1 
these  are  very  unstable  and  easily  eliminated  by  almost  any  data.  But 
if  they  survive  they  point  directly  and  unambiguously  to  a  weakness  in 
the  test  data  which  might  shield  a  potential  error. 

By  mutating  each  relation  or  logical  expression  independently  wp 
actually  achieve  a  stronger  goal  than  that  achieved  by  usual  branch 
analysis. 

Consider  the  compound  predicate 
IF  (A  S  B  AND  CSD)  THEN 

The  usual  branch  analysis  method  would  only  require  two  test  cases  to 
test  this  predicate.  If  the  test  points  were  (A<B,C<D)  and  (A<B,C>D/) 
this  would  have  the  effect  of  only  testing  the  second  clause,  and  not  the 
first.  This  is  because  branch  analysis  fails  to  take  into  account  the 
"hidden  paths"  [4],  implicit  in  compound  predicates,  (see  figure  3). 


FAise  FAise 


figure  3 

In  testing  all  the  "hidden  paths"  mutation  analysis  would  require 
at  least  three  points  to  test  this  predicate,.  The  three  points 
correspond  to  the  branches  (A  >  B.C  >  D),  (A  §  B.C  >  D),  and  (A  5  3,  C  s 
D). 

As  an  example  of  this  consider  the  program  shown  iD  figure  4. 
adapted  from  [8],  The  program,  which  was  also  studied  in  [17],  is 
intended  to  derive  the  number  of  days  between  two  given  days  in  a 
given  year.  Tbe  If  statement  which  determines  whether  a  year  is  a  leap 
year  or  not  is.  however,  incorrect  in  this  version.  Notice  that  if  a  year 
is  divisible  by  400  (year  REM  400  =  0)  it  is  necessarily  divisible  by  100 
(year  REM  100  =  0).  Hence  the  'ogical  expression  formed  by  the 


39 


conjunction  of  these  two  terms  is  equivalent  to  just  the  second  term 
alone.  Alternatively,  the  expression  year  REM  100  =  0  can  be  replaced 
by  the  logical  constant  TRUE  and  the  resulting  mutant  will  be 
equivalent  to  the  original.  Since  this  is  obviously  not  what  the  program¬ 
mer  had  in  mind  the  error  is  discovered. 

PROCEDURE  calendar  {INTEGER  VALUE  dayl,  monthl,  day2,  month2,  year); 
BEGIN 

INTEGER  days; 

IF  month2  =  monthl  THEN  days  =  day2  -  dayl 
COMMENT  if  the  dates  are  in  the  same  month,  we  can  compute 
the  number  of  days  between  them  immediately; 

.  ELSE 
BEGIN 

INTEGER  ARRAY  daysin  (1  ..  12); 

’aysin(l)  ;=  31;  daysin(3)  :=  31;  daysin(4)  :=  30; 
d.,ysin(5)  :=  31;  daysin(6)  :=  30;  daysin(7)  :=  31; 
daysin(8)  :=  31;  daysin(9)  ;=  30;  daysin(10):=  31; 
daysin(ll):=  30;  daysin(12);=  31; 

IF  ((year  REM  4)  =  0)  OR 
((year  REM  100)  =  0  AND  (year  REM  400)  =  0) 

THEN  daysin(2)  ;=  28 
ELSE  daysin(2)  :=  29; 

COMMENT  set  daysin(2)  according  to  whether  or  not  year 
is  a  leap  year  ; 

days  :=  day2  +  (daysin(monthl)  -  dayl); 

COMMENT  this  gives  (the  correct  number  of  days  -  days 
in  complete  intervening  months); 

FOR  i  :=  monthl  +  1  UNTIL  mont h2  -1  DO 
days  :=  daysin(i)  +  days; 

COMMENT  add  in  the  days  in  complete  intervening  months; 

END; 

WRITE(days) 

END; 


figure  4 


5.4.  DATA  FLO*  ANALYSIS 

During  execution  a  program  may  access  a  variable  in  one  of  three 
ways.  A  variable  is  defined  if  the  result  of  a  statement  is  to  assign  a 
value  to  the  variable.  A  variable  is  referenced  if  the  statement  required 
the  value  of  the  variable  to  be  accessed.  Finally  a  variable  is  undefined 
if  the  semantics  of  the  language,  do  not  explicitly  give  any  other  value 
to  the  variable.  Examples  of  the  latter  are  the  values  of  local  variables 
on  invocation  or  procedure  return,  or  DO  loop  indices  in  FORTRAN  on 
normal  do  loop  termination. 

Fosdick  and  Osterweil  [16]  have  defined  three  types  of  data  flow 
anomalies  which  are  often  indicative  of  program  errors.  These 
anomalies  are  consecutive  accesses  to  a  variable  of  the  forms: 

l)  undefined  and  then  referenced 


A 


40 


2)  defined  and  then  undefined 

3)  defined  and  then  defined  again 

The  first  is  almost  always  indicative  of  an  error,  even  if  it  occurs 
only  on  a  single  path  between  the  place  where  the  variable  becomes 
undefined  and  the  reference  place.  The  second  and  third,  however,  may 
not  be  indications  of  errors  unless  they  occur  on  every  path  between 
the  two  statements. 

Although  the  first  type  of  anomaly  is  not  attacked  by  mutations 
per  se  It  is  attacked  by  the  mutation  system,  which  is  a  large  interpre¬ 
tive  system  for  automatically  generating  and  testing  mutants.  When¬ 
ever  the  value  of  a  variable  becomes  undefined  it  is  set.  to  a  unique  con¬ 
stant  undefined.  Before  every  variable  reference  ;\  cheek  is  performed 
to  see  if  the  variable  has  this  value.  If  the  variable  does  the  error  is 
reported  to  the  user,  who  can  take  corrective  action. 

The  second  and  third  types  of  anomalies  are  attacked  more 
directly.  If  a  variable  is  defined  and  not  used  then  usually  the  state¬ 
ment  can  be  eliminated  with  no  obvious  change  (by  the  CONTINUE, 
insertion  mutations  described  in  the  last  section.)  This  may  not  be  the 
case  if,  for  example,  in  the  course  of  defining  the  variable  a  function 
with  side  effects  is  invoked.  In  this  case  the  definition  can  likely  be 
mutated  in  any  number  of  different  ways  which,  while  preserving  the 
side  effect,  obviously  result  in  the  variable  being  given  different  values. 
An  attempt  to  remove  these  mutations  will  almost  certainly  resuh  in 
the  anomaly  being  discovered. 

5.5.  Predicate  Testing 

Kowden  [10]  has  defined  two  broad  categories  Df  program  errors 
under  the  names  domain  error  and  computation  errors  The  notions 
are  not  precise  and  it  is  difficult  with  many  errors  to  decide  which 
category  they  belong  in.  Informally,  however,  a  domain  error  occurs 
when  a  specific  input  follows  the  wrong  path  due  to  an  error  in  a  con¬ 
trol  statement.  A  computation  error  occurs  when  an  input  follows  the 
correct  path  but  because  of  an  error  in  computation  statements  the 
wrong  function  is  computed  for  one  or  more  of  the  output  variables. 

After  Kowden's  study  was  published,  some  researchers  examined 
the  question  of  whether  certain  testing  methodologies  might  reliably 
uncovei  errors  in  these  or  other  classification  schemes.  One  method 
proposed  specifically  directed  to  domain  errors  was  the  domain  stra¬ 
tegy  of  White,  Cohen  and  Chandrasekaran  [19]. 

The  reader  is  referred  to  the  references  for  a  more  complete 
presentation  of  the  technical  restrictions  and  applications  of  their 
method,  but  we  can  here  give  an  informal  description  of  how  it  works. 

If  a  program  contains  N  input  variables  (including  parameters, 
array  elements  and  I/O  variables)  then  a  predicate  can  be  described 
by  a  surface  in  the  N  dimensional  input  space.  Often  the  predicate  is 
linear,  in  which  case  the  surface  is  an  N  dimensional  hyperplane.  Let  us 
consider  a  simple  two  dimensional  case  where  we  have  input  variables  1 
and  J  and  the  predicate  in  question  is 

I+2JZ-3 

The  Domain  strategy  would  tell  us  that  in  order  to  test  his  predi¬ 
cate  we  need  three  test  points,  two  on  the  line  !+2J=-3  and  one  a  smat’ 


42 


distence  c  from  the  line,  (see  figure  5.) 

Assuming  a  correct  outcome  from  these  tests  what  have  we 
discovered?  We  know  the  line  of  the  predicate  must  cut  the  sections  of 
the  triangle  AB  and  BC.  Since  e  is  quite  small  the  chances  of  the  predi¬ 
cate  being  one  of  these  alternatives  is  also  small.  Hence,  although  we 
don’t  have  complete  confidence  that  the  predicate  is  correct,  we  do 
have  a  much  larger  degree  of  confidence  then  we  could  otherwise  have 
attained. 

To  see  how  mutation  analysis  deals  with  the  samr  problem  we  first 
observe  that  it  really  is  not  necessary  to  have  both  A  and  C  be  on  the 
predicate  line.  If  A  is  on  the  line  and  B  and  C  are  on  opposite  sides  of 
the  line  the  same  result  follows.  We  now  described  how  mutations 
cause  these  three  points  to  be  generated. 

As  an  intuitive  aid  one  can  think  of  mutation  analysis  as  posing 
certain  alternatives  to  the  predicate  in  question,  and  requiring  the  Les¬ 
ter  to  supply  reasons,  in  the  form  of  test  data,  why  the  alternative 
predicated  would  not  be  used  just  as  well  in  place  of  the  origina'. 
These  alternatives  are  constructed  in  various  ways. 

A  number  of  the  alternatives  are  generated  by  changing  relational 
operators.  Changing  an  inequality  operator  to  a  strict  inequality 
operator,  or  vice  versa,  generates  a  mutant  which  can  only  be  elim¬ 
inated  by  a  test  point  which  exactly  satisfies  the  predicate.  For  exam¬ 
ple  changing  I+2JS-3  to  I+2J<-3  requires  the  tester  to  exhibit  a  point 
for  which  l+2J  =  -3,  hence  which  satisfies  the  first  predicate  but  not  the 
second. 

A  second  class  of  alternatives  Involves  the  introduction  of  the 
unary  operator  "twiddle"  (denoted  ++  or  --).  Twiddle  is  an  example  of  a 
non  FORTRAN  language  construction  used  to  facilitate  the  mutation 
process.  For  an  integer  expression  a,  +  +  a  has  the  meaning  a  +  1.  For 
real  expressions  +  +  a  means  a  +  1/100.  —a  has  a  similar  meaning 
involving  subtractions. 

Graphically,  the  effect  of  Introducing  twiddle  is  to  move  the  pro¬ 
posed  constraint  a  small  distance  parallel  to  the  original  line  (see 
figure  6).  In  order  to  eliminate  th^se  mutants  a  data  point  must  be 
found  which  satisfies  one  constraint  but  not  the  other,  hence  is  very 
close  to  the  original  constraint  line, 

Finally  a  third  class  of  alternatives  are  constructed  by  changing 
each  data  reference  into  all  other  syntactically  correct  data  refer¬ 
ences,  and  each  operator  into  all  other  syntactically  correct  operators. 
The  effects  of  these  are  related  to  the  phenomenon  of  spoilers,  which 
are  described  in  section  5,9. 

The  total  effect  caused  by  so  many  alternatives  is  to  increase  th' 
number  of  data  points  necessary  for  their  elimination,  hence  by  a  pro¬ 
cess  similar  to  that  of  Cohen  et  al[l9]  to  increase  our  confidence  that 
the  predicate  is  indeed  correct. 

In  order  to  more  graphically  illustrate  the  construction  of  these 
alternatives  and  demonstrate  their  utility  we  will  go  through  a  small 
example.  The  program  in  figure  7  was  taken  from  [19].  No 
specifications  were  given,  but  the  program  can  be  compared  against  a 
presumably  "correct"  version.  It  was  chosen  here  because  it  only 
involves  two  input  variables,  hence  the  alternatives  can  be  easily  illus¬ 
trated  in  a  graphical  manner. 


READ  I.J: 


IF  I  S  J  +  1 
THEN  K  =  I  +  J  -  1: 
ELSE  K  =  2*1  +  1; 

IFK  *  I  +  1 
THEN  L  =  I  +  1; 

ELSE  L  =  J  -  1; 

IF  I  =  5 

THEN  M  =  2*L  +  K; 
ELSE  M  =  L  +  2*K  -1; 

WRITE  M; 


1.  IF  (I  S  J  +  0) 

2.  IF  (I  £  J  +  2) 

3.  IF  (1  S  J  +  I) 

4.  IF  (I  S  J  +  J) 

5.  IF  (1  S  J  +  1) 

6.  IF  (2  S  J  +  1) 

7.  IF  (5  S  J  +  1) 

8.  IF  (I  S  1  +  1) 

9.  IF  (I  £  2  +  1) 

10.  IF  (I  §  5  +  1) 

1 1.  1F(1  §  J  +  5) 

12.  IF(-I  SJ  +  1) 

13.  IF(  ++UJ  +  1) 

14.  IF( — I  SJ+1) 

15.  IF(I  S  -J  +  1) 

16.  IF(I  S  +  +J  +  1) 

17.  IF(I  S  --J  +  l) 

18.  IF(1  S  -(J  +  1)) 

19.  IF(I  S  +  4(J  +  1)) 

20.  IF(I  S  --(J  +  1)) 

21.  IF(.N0T.  IS  J  +  1) 

22.  IF(I  SJ-1) 

23.  IF(I  S  MOD(J.l)) 

24.  IF(I  S  J/l) 

25.  IF(I  S  J*l) 

26.  IF(I  S  J**l) 

27.  IF(I  S  J) 

28.  IF(I  S  1) 

29.  IF(I  <  J  +  1) 

30.  IF(I  =  i  +  1) 

31.  IF(I  '  J  +  1) 

32.  IF(I  >  J  +  1) 

33.  IF  (I  S  J  +  1) 


figure  7 


figure  8 


45 


As  you  can  see  the  program  has  three  predicates:  1<J+1,  K^l+1 
and  1=5.  We  will  illustrate  only  the  effects  of  changing  the  first. 

Figure  8  gives  a  listing  of  all  the  alternatives  tried  for  the  predi¬ 
cate  1  S  J+l.  Some  of  the  choices  are  redundant,  for  example  +  +  1  § 
J+l  and  I  I  -J  +  1.  This  is  because  the  mutations  are  generated  in  an 
entirety  mechanical  way.  It  is  our  feeling  that  the  processing  time  lost 
because  of  redundant  mutations  is  much  less  then  the  time  which 
would  be  required  to  eliminate  them  by  preprocessing  the  alternatives. 

The  alternative  predicates  so  introduced  are  illustrated  in  figure 
9.  The  original  predicate  is  the  heavy  line  running  from  the  lower  left 
to  the  upper  right. 

In  the  paper  from  which  the  example  program  was  taken  the 
authors  hypothesize  that  the  program  contains  the  following  four 
errors. 

1)  The  predicate  K  £  1+1  should  be  K  2  1+2. 

2)  The  predicate  1=5  should  be  1=5-J. 

3)  The  statement  L=J-1  should  be  L=l-2. 

4)  The  statement  K=I  +  J-1  should  read 

THEN  IF  (2*J  <  -5*1  -40) 

THEN  K  =  3; 

ELSE  K=l+J-1; 

We  leave  it  as  an  exercise  to  verify  that  the  attempt  to  eliminate 
the  alternative  K  >  1+2  must  necessarily  end  with  the  discovery  of  the 
first  error.  Note  that  his  is  not  trivially  the  case  since  errors  1  and  4 
can  interact  in  a  subtle  fashion.  In  later  sections  we  will  show  how  the 
remaining  three  errors  are  dealt  with. 

5.6.  Domain  Pushing 

One  very  important  mutation  which  was  mentioned  in  the  last  sec¬ 
tion  is  the  introduction  of  unary  operators  into  the  program.  These 
unary  operators  are  introduced  wherever  they  are  syntactically 
correct  according  to  the  rules  of  FORTRAN  expression  construction.  In 
addition  to  the  operators  ++  and  --  discussed  in  the  last  section,  the 
remaining  unary  operators  are  -  (arithmetic  negation)  and  a  class  of 
non  FORTRAN  operators  !  (absolute  value),  -!  (negative  absolute  value) 
and  Z!  (zero  value).  It  is  the  last  three  which  will  be  of  most  concern  to 
us  in  this  section. 

Consider  the  statement 
A  =  B  +  C 

in  order  to  eliminate  the  mutants 

A  =  *B  +  C 
A  =  B  +  !C 
A  =  !(B  +  C) 

we  must  generate  a  set  of  test  points  where  B  is  negative  (so  that  B+C 
will  differ  from  !B+C),  C  is  negative  and  the  sum  B+C  is  negative* . 

1)  Notice  that  tf  It  is  Impossible  for  B  to  be  rvcgaUvo  then  this  is  an  equivalent  mutation,  that 
is  the  altered  program  is  equivalent  to  the  original  In  this  case  the  proliferation  of  these  al¬ 
ternative  can  either  be  a  nuisance  or  an  important  documentation  aid.  depending  upon  the 


47 


Similarly  negative  absolute  value  insertion  forces  the  test  data  to  be 
positive.  We  use  the  term  data  pushing  for  this  process,  meaning  the 
mutations  push  the  tester  into  producing  test  cases  where  the  domains 
satisfy  the  given  requirements. 

Zero  Value  is  an  operator  defined  such  that  Z!  exp  IS  exp  if  the 
value  is  non-zero,  otherwise  if  the  expression  evaluates  to  zero  the 
value  is  an  arbitrarily  chosen  large  positive  constant.  Hence  the  elimi¬ 
nation  of  this  mutant  requires  a  test  set  where  the  expression  has  the 
value  zero. 

Multiply  this  process  by  every  position  where  an  absolute  value 
sign  can  be  inserted  and  you  can  see  a  scattering  effect,  where  the  tes¬ 
ter  is  forced  to  include  test  cases  acting  in  various  conditions  in 
numerous  problem  domains.  Very  often  in  the  presence  of  an  error  this 
scattering  effect  will  cause  a  test  case  to  be  generated  which  will 
demonstrate  the  error. 

Consider  again  the  example  studied  above.  Figure  10  gives  a  list  of 
mutants  and  the  accompanying  graph  shows  the  domains  they  push 
into.  As  you  can  see  even  this  simple  example  generates  an  extremely 
large  number  of  requirements. 

1.  IF  (FI  >  J  +  1) 

2.  IF  (1  >  !J  +  1) 

3.  IF  (I  >  !(J  +  1)) 

4.  K  =  (!I  +  J)  -  1 

5.  K  =  (1  +  !J)  -  1 

6.  K  =  !(]  +  J)  -  1 

7.  K  =  !((1  +  J)  -  1) 

8.  K  =  2  •  !1  +  1 

9.  K  =  !(2  •  1)  +  1 

10.  K  =  !(2  •  I  +  1) 

11.  IF  (!K  <  I  +  1) 

12.  IF  (K  <  !1  +  1) 

13.  IF  (K  <  !(I  +  1)) 

14.  L  =  M  +  l 

15.  L  =  !(1  +  1) 

16.  L  =  !J  -  1 

17.  L  =  !(J  -  1) 

18.  IF  (!I  5) 

19.  M  =  2  •  !L  +  K 

20.  M  =  !2  *  L  +  K 

21.  M  =  2  *  L  +  !K 

22.  M  =  !(2  *  L  +  K) 

23.  M  =  !L  +  2  *  K  -  1 

24.  M  =  L  +  2  •  !K  -  1 

25.  M  =  L  +  !2  *  K  -  1 

26.  M  =  !(L  +  2  *  K)  -  1 

27.  M  =  !(L  +  2  *  K  -  1) 

figure  10 

Recall  again  that  one  of  the  errors  this  program  was  presumed  to 
contain  was  that  the  statement  L=J-1  should  have  read  L=I-2.  One 
effect  of  this  error  is  that  any  test  point  in  the  area  bounded  by  1  =  J+l 

teaters  point  of  view  TTie  topic  of  equivalent  mutants  will  be  taken  up  in  sectioD  5  10 


and  I  =  1  will  be  computed  incorrectly.  But  it  is  precisely  this  area  that 
mutants  8,  9  and  10  push  us  into.  This  means  that  this  error  could  not 
have  gone  undiscovered  using  mutation  analysis. 

This  process  of  pushing  the  programmer  into  producing  data  satis¬ 
fying  some  criterion  is  also  often  accomplished  by  other  mutations. 
Consider  the  program  in  figure  12,  which  is  based  on  a  program  by 
Naur[l4],  and  has  been  previously  studied  in  the  literature  [7], 

alarm  :=  FALSE 
bufpos  :=  0; 
fill  :=  0; 

REPEAT 

i  character(cw); 

I  cw  =  BL  or  cw  =  NL 
THEN 

IF  fill  +  bufpos  §  maxpos 
THEN  BEGIN 

outcharacter(BL); 

END 

ELSE  BEGIN 

outcharacter(NL); 
fill  :=  0  end; 

FOR  k  ;=  1  STEP  1  UNTIL  bufpos  DO 
outcharacter(buffer[k]); 
fill  :=  fill  +  bufpos; 
bufpos  :=  0  END 
ELSE 

IF  bufpos  =  maxpos 
THEN  alarm  :=  TRUE; 

ELSE  BEGIN 
bufpos  :=  bufpos  +  1; 
buffer[bufpos]  :=  cw  END 
UNTIL  alarm  OR  cw  =  ET 

figure  12 

Consider  the  mutant  which  replaces  the  first  statement  FILL:=0 
with  the  statement  F1LL:=1.  The  effect  of  this  mutation  is  to  force  a 
test  case  to  be  defined  in  which  the  first  word  is  less  then  MAXPOS 
characters  long.  This  test  case  then  detects  one  of  the  five  errors  in 
the  program  [7],  The  surprising  thing  is  that  the  effect  of  this  muta¬ 
tion  seems  to  be  totally  unrelated  to  the  statement  in  which  the  muta¬ 
tion  takes  place. 

5.7.  Special  Values  Testing 

Another  form  of  testing  which  has  been  introduced  by  Howden[ll], 
is  called  special  values  testing.  Special  values  testing  is  defined  in 
terms  of  a  number  of  "rules",  for  example 

1.  Every  subexpression  should  be  testing  on  at  least  one  test  case 
which  forces  the  expression  to  be  zero. 

2.  Every  variable  and  subexpression  should  take  on  a  distinct  set  of 
values  in  the  test  cases. 

That  the  first  rule  is  enforced  by  the  zero  values  mutations  has 
already  been  discussed  in  the  last  section  on  domain  pushing. 


That  the  second  rule  is  important  is  undeniable.  !f  two  variables 
are  always  given  the  same  value  then  they  are  not  acting  as  "free  vari¬ 
ables"  and  a  reference  to  one  can  be  universally  replaced  with  a  refer¬ 
ence  to  the  second.  In  fact  this  is  exactly  what  happens  in  this  case, 
and  the  existence  of  these  mutations  enforces  the  goals  or  the  distinct, 
values  rule. 

A  slightly  more  general  method  of  enforcing  this  goal  can  be  con¬ 
structed  as  follows:  A  special  array  exactly  as  large  as  the  number  of 
subexpressions  computed  in  the  program  is  kept,  with  two  additional 
tag  bits  for  each  entry  in  this  array.  Initially  all  tag  bits  are  off.  indicat¬ 
ing  the  array  is  uninitialized.  As  each  subexpression  is  encountered  in 
turn  the  value  at  that  point  is  recorded  in  the  array  and  the  first  tag 
bit  is  set.  Subsequently  when  the  subexpression  is  again  encountered  if 
the  second  tag  bit  is  still  off  the  current  value  of  the  expression  is  com¬ 
pared  against  the  recorded  value.  If  they  differ  the  second  tag  bit  is 
set.  Otherwise  no  change  is  made. 

In  this  fashion  by  counting  those  expressions  in  which  the  second 
tag  bit  is  OFF  and  the  first  ON  one  can  infer  which  subexpression  have 
not  altered  their  value  over  the  test  case  executions,  and  hence  one 
can  construct  mutations  to  reveal  this.  This  method  is  similar  to  one 
used  in  a  compiler  system  by  Hamlet[B). 

5.8.  Coincidental  Correctness 

We  say  the  result  of  evaluating  a  given  test  point  is  coincidentally 
correct  if  the  result  matches  the  intended  value  in  spite  of  the  fact 
that  the  function  used  to  compute  the  value  is  incorrect.  For  example 
if  all  our  test  data  results  in  the  variable  I  having  the  values  2  or  9, 
then  the  computation  J  =  1*2  could  be  coincidentally  correct  if  what 
was  intended  was  J  =  1**2. 

The  problem  of  coincidental  correctness  is  really  centra!  to  oro- 
gram  testing.  Every  programmer  who  tests  an  incorrect  program,  and 
deems  it  to  be  correct,  has  really  encountered  an  incidence  of  coin¬ 
cidental  correctness  Yet  with  the  exception  of  mutation  analysis  no 
testing  methodology  in  the  authors  knowledge  deals  directly  with  t.nis 
problem.  Some  researches  even  go  so  far  as  to  state  that  the  problems 
of  coincidental  correctness  are  intractable  [19], 

In  mutation  analysis  coincidental  correctness  is  attacked  by  the 
use  of  spoilers.  Spoilers  implicitly  remove  from  consideration  data 
points  for  which  the  results  could  obviously  be  coincidentally  correct, 
in  a  sense  "spoiling  ’  those  data  points.  For  example  by  explicitly  mak¬ 
ing  the  mutation  J=I*2  =>  J  =  I**2  we  spoil  those  test  cases  for  which  1  = 
0  or  1  =  2.  and  require  that  at  ipast  one  test  case  have  an  alternative 
value. 

Using  again  the  example  program  introduced  above,  figures  13 
and  14  show  the  spoilers  and  their  effects  associated  with  the  state¬ 
ment  M=L+2*K-1.  Notice  a  single  spoiler  may  be  associated  with  up  '.•• 
four  different  lines  depending  upon  the  outcomes  or  the  first  two  predi¬ 
cates  in  the  program.  Pictorially,  the  effects  of  speiers  are  that  wtthin 
each  data  domain  for  each  line  there  must  be  at  least  one  test  case 
which  does  not  lie  on  the  given  line.  In  broad  terms  the  effects  of  th’s 
are  to  require  a  large  number  of  data  points  for  which  the  possibilities 
of  coincidental  correctness  are  very  slight. 


51 


1.  M  =  (L  +  1*K)  -  1 

2.  M  =  (L  +  3*K)  -  1 

3.  M  =  (1  +  2*K)  -  1 

4.  M  =  {J  +  2*K)  -  1 

5.  M  =  (K  +  2*K)  -  1 

6.  M  =  (L  +  2*1)  -  1 

7.  M  =  (L  +  2*J)  -  1 
6.  M  =  (L  +  2*L)  -  1 

9.  M  =  (L  +  1*K)  -  1 

10.  M  =  (L  +  J*K)  -  1 

11.  M  =  (L+  K*K)  -  1 

12.  M  =  {L  +  L*K)  -  1 

13.  M  =  (L  +  2*K)  -  1 

14.  M  =  (L  +  2*K)  -  J 

15.  M  =  (L  +  2*K)  -  K 

16.  M  =  (L  +  2*K)  -  L 

17.  M  =  (1  +  2*K)  -  1 

18.  M  =  (2  +  2*K)  -  1 

19.  M  =  (5  +  2*K)  -  1 

20.  M  =  (L  +  2*1)  -  1 

21.  M  =  (L  +  2*2)  -  1 

22.  M  =  (L  +  2*5)  -  1 

23.  M  =  (L  +  5*K)  -  1 

24.  M  =  (  -  L  +  2*K)  -  1 

25.  M  =  (L  +  -  2*K)  -  1 

26.  M  =  (L  +  2*  -K)  -  1 

27.  M  =  (L  +  2*  -  K)  -  1 

28.  M  =  -  (L  +  2*K)  -  1 

29.  M  =  -  ((L  +  2*K)  -  1) 

30.  M  =  (L  +  2  +  K)  -  1 

31.  M  =  (L  +  2  -  K)  -  1 

32.  M  =  (L  +  MOD(2.K))  -  1 

33.  M  =  (L  +  2/K)  -  1 

34.  M  =  (L  +  2**K)  -  1 

35.  M  =  (L  +  2)  -  1 

36.  M  =  (L  +  K)  -  1 

37.  M  =  L  -  2*K  -  1 

38.  M  =  (M0D(L.2*K))  -  1 

39.  M  =  L/2*K  -  1 

40.  M  =  L*2*K  -  1 

41.  M  =  L**(2*K)  -  1 

42.  M  =  L  -  1 

43.  M  =  (2*K)  -  1 

44.  M  =  L  +  2*K  +  1 

45.  M  =  MOD(L  +  2*K.l) 

46.  M  =  (L  +  2*K)/1 

47.  M  =  (L  +  2*K)*1 

48.  M  =  (L  +  2*K)**1 

49.  M  =  (L  +  2*K) 

50.  M  =  1 


figure  13 


01- 


53 


for  R1  =  0  by  1  to  N  begin 
R0<-a(Rl) 

for  R2  =  Rl  +  1  by  1  to  N  begin 
if  a(R2)  >  RO  then  begin 
RO  <-  a(R2) 

R3  <-  R2 
end 
end 

R2  <-  a(Rl ) 
a(Rl )  <-  RO 
a(R3)  <-  R2 
end 


figure  15 

Often  the  fact  that  two  expressions  are  coincidentally  the  same 
over  the  input  data  is  an  indication  of  program  error  or  poor  testing. 
For  example  the  sorting  program  shown  in  figure  15.  taken  from  a 
paper  by  Wirth[20],  will  perform  correctly  for  a  large  number  of  input 
values.  If,  however,  the  statements  following  the  IF  statement  are 
never  executed  for  some  loop  iteration  it  is  possible  for  R3  to  be 
incorrectly  set,  and  an  incorrectly  sorted  array  may  be  produced. 

By  constructing  the  mutant  which  replaces  the  statement  a(Rl)  «- 
RO  with  a(Rl)  «-  a(R3)  we  point  out  that  there  are  two  ways  of  defining 
RO,  only  one  of  which  is  used  in  the  test  data.  Therefore  the  error  is 
uncovered. 

5.9.  Missing  Path  Errors 

As  identified  by  Kowden  [10],  we  can  say  a  prog;  am  contains  a 
missing  path  error  if  a  predicate  is  required  which  does  not  appear  in 
the  program  under  test,  causing  some  data  to  computed  by  the  same 
function  when  really  different  functions  are  called  for.  These  missing 
predicates  can  really  be  the  result  of  two  different  problems,  however, 
so  we  might  consider  the  following  definitions. 

A  program  contains  a  specific  ational  missing  path  error  if  two 
cases  which  at  e  treated  differently  in  the  specifications  are  incorrectly 
combined  into  a  single  function  in  the  program.  On  the  other  hand  a 
program  contains  a  computational  missing  path  error  if  within  the 
domain  of  a  single  specification  a  path  is  missing  which  is  required  only 
because  of  the  nature  of  the  algorithm  or  data  involved. 

As  example  of  the  first  type  of  path  error  is  error  number  four 
from  the  example  in  section  5.5.  Although  this  error  might  result  from 
a  specification,  there  is  nothing  in  the  code  itself  which  would  give  any 
hint  that  the  data  in  the  range  2*j<-5*i-40  is  to  be  handled  any 
differently  then  given  in  the  test  program. 

For  an  example  of  the  second  class  of  error  consider  the  subrou¬ 
tine  shown  in  figure  16,  adapted  from  [13].  The  inputs  are  a  sorted 
table  of  numbers  and  an  element  which  may  or  may  not  be  in  the  table. 
The  only  specification  is  that  upon  return  X(LOW)  S  A  S  X{H1GH),  and 
HIGH  <=  LOW  +  1.  The  problem  arizes  if  the  program  is  presented  with 
a  table  of  only  one  entry,  in  which  case  the  program  loops  forever. 

Nothing  in  the  specifications  state  that  a  table  with  only  one  entry 
is  to  behave  any  differently  from  a  table  with  multiple  entries,  it  is  only 


SUBROUTINE  BIN(X.N.A,LOW.HIGH) 

INTEGER  X(N),N.A. LOW. HIGH 
INTEGER  MID 
LOW  =  1 
HIGH  =  N 

6  IF(H1GH-  LOW-  1)7.12.7 

12  STOP 

7  MID  =  (LOW  +  HIGH)  /  2 

IF  (A  -  X(MID))  9,10, 10 

9  HIGH  =  MID 
GOTO  6 

10  LOW  =  MID 
GOTO  6 
END 

figure  16 

because  of  the  algorithm  used  that  this  must  be  treated  as  a  special 
case. 

Problems  of  the  second  type  are  usually  caused  by  the  necessity 
to  treat  certain  values,  for  example  negative  numbers,  differently  from 
others.  This  being  the  case  the  process  of  data  pushing  and  spoiling 
described  in  sections  5.6  and  5.8  will  often  lead  to  the  detection  of 
these  errors.  So  it  is  in  this  case  where  an  attempt  to  remove  either  of 
the  following  mutants  will  cause  us  to  generate  a  test  case  with  a  single 
element. 

IF  (HIGH  -  LOW  -  1)  12.12,7 
MID  =  (LOW  +  HIGH)  -  2 

Since  mutation  analysis,  like  most  other  testing  methodologies, 
deals  only  with  the  program  under  test  (as  opposed  to  dealing  with  the 
specifications  of  those  programs),  the  problems  of  detecting 
specificational  missing  path  errors  are  much  more  difficult.  Since 
mutation  analysis  causes  the  tester  to  generate  a  number  of  data 
points  which  exercise  the  program  in  a  multiplicity  of  ways  our 
chances  of  stumbling  into  the  area  where  the  program  misbehaves  are 
high,  but  are  by  no  means  certain. 

So  it  is  with  the  missing  path  error  from  the  example  in  section 
5.5.  It  is  possible  to  generate  test  data  which  passes  our  test  criterion 
but  which  fails  to  detect  the  missing  path  error.  We  view  this  not  as  a 
failure  of  mutation  analysis,  however,  but  as  a  fundamental  limitation 
in  the  testing  process.  In  the  author?  view  the  only  way  that  these  sorts 
of  problems  have  a  hope  of  being  eliminated  is  to  start  with  a  core  of 
test  cases  generated  from  the  specifications,  independent  of  the  pro¬ 
gram  implementation.  This  core  of  test  cases  can  then  be  augmented 
to  achieve  goals  such  as  those  presented  by  mutation  analysis.  Some 
methods  of  generating  test  data  from  specifications  have  been  dis¬ 
cussed  elsewhere  [7,17j. 

5.10.  Equivalent  Mutants 

As  was  mentioned  in  a  footnote  in  section  5.6,  if  a  variable  is  con¬ 
strained  to  being  strictly  positive  (which  is  often  the  case)  then  insert¬ 
ing  an  abso'ute  value  sign  before  each  reference  to  that  variable  will 


55 


generate  an  alternative  program,  which  is  in  all  respects  functionally 
identica'  to  the  original.  A  mutation  which  produces  such  an  equivalent 
program  is  called  an  equivalent  mutant. 

Almost  any  of  the  mutation  types  used  in  the  current  system  can. 
under  the  right  circumstances,  produce  an  equivalent  mutant.  It  has 
been  observed  empirically  that  with  the  exception  of  those  mutations 
produced  by  inserting  absolute  value  signs  (which  often  vary  widely) 
the  number  of  equivalent  mutants  produced  is  usually  2-5S5  of  the  total 
number  of  mutants. 

In  the  current  system  no  attempt  is  made  to  remove  equivalent 
mutants  algorithmically,  even  though  in  a  large  number  of  cases  it 
.would  be  possible  to  do  so.  The  reason  for  this  decision  is  because  even 
though  equivalent  mutants  serve  no  purpose  from  the  point  of  view  of 
test  data  analysis,  they  serve  a  very  important  role  in  error  detection. 

No  mutant  is  ever  declared  equivalent  except  by  an  explicit  com¬ 
mand  from  the  tester.  In  order  to  determine  equivalence  the  tester 
must  often  spend  a  considerable  amount  of  time  examining  the  code, 
and  in  the  process  obtain  an  intimate  knowledge  of  the  algorithm  ana 
how  it  works. 

Often  a  number  of  mutants  can  be  labeled  equivalent  on  the 
strength  of  a  single  insight.  Example  are  recognizing  that  a  variable  is 
by  necessity  positive  during  part  of  the  program,  or  recognizing  that  in 
a  binary  search  algorithm  it  doesn’t  matter  how  you  choose  the  middle 
element  as  long  as  it  is  between  the  lower  and  upper  bounds. 

The  fact  is,  however,  that  in  attempting  to  remove  equivalent 
mutants  we  are  forcing  the  programmer  into  a  very  careful  review  of 
the  program.  How  many  errors  are  discovered  in  this  manner  is  more 
of  a  question  in  psychology  then  in  program  testing,  but  our  experi¬ 
ence  has  been  that  often  such  a  careful  review  will  uncover  very  subtle 
errors  which  would  be  difficult  to  discover  by  other  means. 

As  an  example  of  this  process,  we  must  admit  that  no  mutation  in 
the  current  system  would  force  the  tester  into  discovering  the  second 
error  in  the  program  in  section  5.5.  (Notice  that  if  J  had  been  refer¬ 
enced  in  the  section  of  code  following  the  1=5  predicate  then  the  pro¬ 
cess  of  data  pushing  would  have  revealed  this  error.)  None  the  less  the 
following  mutants  are  equivalent  for  the  given  program.  An  examina¬ 
tion  of  these  would  force  the  tester  almost  directly  into  a  review  of  the 
area  of  code  containing  the  bug.  And  the  search  would  be  intensified  if 
the  tester  realized  these  changes  would  not  be  equivalent  in  the 
corrected  program. 

M  =  2*!L  +  K 
M  =  !2*L  +  K 
M  =  2*L  +  \K 
M  =  !(2»L  +  K) 

6.  Discussion 

After  an  extended  exposition  of  the  mechanics  of  mutation 
analysis  we  are  now  in  a  position  to  take  a  more  global  look  into  why 
this  all  works.  It  seems  to  us  that  there  are  two  general  arguments 
which  can  be  put  forth,  summarized  as  follows: 


1)  With  respect  to  error  detection,  it  is  not  that  the  mutants  them¬ 
selves  capture  the  errors  which  may  be  in  the  program,  it  is 
rather  that  the  mutation  task  forces  the  tester  into  finding  data 
which  exercises  the  program  in  a  multiplicity  of  ways,  and  this 
exercising  is  what  is  likely  to  uncover  the  errors. 

2)  The  goal  of  mutation  analysis  is  difficult  to  attain  (this  is 
confirmed  by  more  then  two  years  experience  with  this  process), 
and  by  setting  a  difficult  goal  we  force  the  programmer  into  a  very 
careful  review  of  the  programs.  Independent  of  all  other  claims 
made  by  this  method,  merely  forcing  the  programmer  to  spend  an 
extended  period  of  time  reviewing  the  coded  product  will  often 
lead  him  into  discovering  errors  in  logic  or  design. 

Of  course  we  would  hope  that  the  first  is  the  dominant  reason  for 
discovering  errors  in  programs,  and  indeed  the  studies  we  have  so  far 
conducted  indicate  this.  We  mention  the  second,  however,  because  it  is 
often  significant  in  real  applications,  and  is  a  fact  not  usually  noticed 
by  automated  tool  designers. 

As  we  saw  in  section  5.10,  the  mutations  implemented  in  the 
current  system  are  not  sufficient  to  detect  all  programming  errors. 
This  we  view  not  as  a  weakness  in  the  methodotogy  but  in  the  mutation 
operators  used.  As  we  collect  more  and  more  examples  of  such  errors 
we  can  look  for  patterns  in  the  types  of  errors  which  can  go  undetected 
by  our  system.  By  observing  these  patterns  we  may  find  new  mutant 
operators  which  will  detect  these  errors.  In  this  manner  the  system 
may  be  continually  improved,  and  our  understanding  of  the  program¬ 
ming  process  itself  increased. 

We  have  also  observed  that  as  the  complexity  of  programs 
increases,  the  number  of  "building  blocks”  from  which  mutations  are 
constructed  ^roies2,  and  the  chances  for  errors  like  those  jusl 
described  to  go  undetected  actually  diminishes.  This  is  perhaps  a 
novelty-  a  method  which  works  better  on  complex  programs  then  on 
simple  ones  ! 


Acknowledgements 

We  wish  to  thank  Alan  Acree,  Jim  Barns,  Edie  Martin,  and  Dan 
St.  Andre  for  their  contributions  to  the  program  mutation 
effort. 


2)  the  number  of  mutants  grows  roughly  proportional  to  the  number  of  statements  times  the 
number  of  unique  data  references  In  the  program. 


57 


[1]  T.A.  Budd  and  R.J.  Lipton.  "Mutation  Analysis  of  Decision  Table  Pro¬ 
grams",  Proceedings  of  the  1978  Conference  on  Information  Sci¬ 
ences  and  Systems,  Johns  Hopkins  University.  1978. 

[2]  T.A.  Budd,  R.J.  Lipton,  F.G.  Sayward  and  R.A.  DeMillo,  "The  Design 
of  a  Prototype  Mutation  System  for  Program  Testing",  AF1PS  1978 
NCC.  pp  623-627. 

[3]  J.R.  Brown  and  M.  Lipow,  "Testing  for  Software  Reliability", 
Proceedings  of  the  1975  International  Conference  on  Reliable 
Software. 

[4]  R.A.  DeMillo,  R.J.  Lipton  and  F.G.  Sayward,  "Hints  on  Test  Data 
Selection:  Help  for  the  Practicing  Programmer”,  COMPUTER,  Vol. 
11,4.  April  1978. 

[5]  R.A.  DeMillo,  R.J.  Lipton  and  F.G.  Sayward,  "Program  Mutation  as  a 
Tool  for  Managing  Large-Scale  Software  Development",  ASQC 
Technical  Conference  Transactions-  Chicago. 

[6]  M.  Geller,  "Test  Data  as  an  Aid  in  Proving  Program  Correctness", 
Comm.  ACM  Vol.  21,5  May  1978  .  pp  368-375. 

[7]  J.B.  Goodenough  and  S.L.  Gerhart,  "Toward  a  Theory  of  Test  Data 
Selection",  IEEE  Transactions-of  Software  Engineering,  June  1975. 

[8]  R.G.  Hamlet,  "Testing  Programs  with  the  Aid  of  a  Compiler",  IEEE 
Transactions  of  Software  Engineering.  SE3-4.  July  1977. 

[9]  C.A.R.  Hoare,  "Algorithm  65:  FIND”,  Comm.  ACM  4,1  (April  1961). 
pp.  321. 

[10]  W.E.  Howden,  "Reliability  of  the  Path  Analysis  Testing  Strategy", 
IEEE  Transactions  of  Software  Engineering,  September  1976. 

[11]  W.E.  Howden.  "An  Evaluation  of  the  Effectiveness  of  Symbolic  Test¬ 
ing".  Software  -  Practice  and  Experience,  Vol.  8,381-397(1978). 

[12]  J.C.  Huang,  "An  Approach  to  Program  Testing”,  ACM  Computing 
Surveys,  September  1975. 

[13]  B.W.  Kernighan,  and  P.J.  Plauger,  The  Elements  of  Programming 
Style,  McGraw  Hill,  New  York.N.Y.,  1978  (2nd  ed.) 

[  14]  P-  Naur,  "Programming  by  Action  Clusters",  BIT,  Vol.  9.  pp  250-258. 
1969. 

[15]  L.J.  Osterweil  and  L.D.  Fosdick,  "Experience  with  DAVE-  A  Fortran 
Program  Analyzer".  Proc.  1976  AFIP  NCC,  Vol  45,  PP.  909-915. 

[18]  L.J.  Osterweil  and  L.D.  Fosdick,  "Data  Flow  Analysis  as  an  Aide  in 
Documentation,  Assertion  Generation,  Validation,  and  Error  Detec¬ 
tion",  Technical  Report  CU-CS-055-74,  Department  of  Computer 
Science,  University  of  Colorado,  Boulder,  September  1974. 

[17]  T.J.  Ostrand,  E.J.  Weyuker,  "Remarks  on  the  Theory  of  Test  Data 
Selection",  Digest  for  the  IEEE  Workshop  on  Software  Testing  and 
Test  Documentation,  Ft.  Lauderdale,  FI.  1978. 

[18]  R.J.  Rubey,  J.A.  Dana,  and  P.W.  Biche,  "Quantitative  Aspects  of 
Software  Validation",  IEEE  Transactions  of  Software  Engineering, 
June  1975. 

[19]  L.J.  White,  E.I.  Cohen  and  B.  Chandrasekaran,  "A  Domain  Strategy 
for  Computer  Program  Testing",  Ohio  State  University  Technical 
Report  OSU-CISRC-TR-78-4,  1978. 

[20]  N.  Wirth,  "PL360,  a  programming  language  for  the  360  computer". 
JACM,  15.  37-74  (1968). 

[21]  E.A.  Youngs,  Error  Proneness  in  Programming,  PhD  Thesis,  Univer¬ 
sity  of  North  Carolina,  1971. 

-r 


4 


58 


Discussion  of  "A  Survey  of  Programming  Testing  Issues" 

Timothy  A.  Budd  .  Richard  A.  DeMlllo+.  Richard  J.  l,tpton  and 

* 

Frederick  G.  Sayward 

In  this  paper  Goodenough  addresses  a  myriad  of  issues  and  goals 
encountered  in  testing  computer  programs.  During  his  discussion  of  using 
testing  to  show  program  correctness  It  is  explained  that  testing  can  be 
used  to  ensure  the  absence  of  program  errors  providing  that  one  has 
successfully  performed  a  test  which  is  both  reliable^  ^  and  valid/  The 

effectiveness  of  this  approach  lies  in  finding  reliable  and  valid  tests 
which,  as  observed  by  Goodenough,  can  be  extremely  difficult.  Our  main 
purpose  in  this  note  is  to  comment  on  these  concepts,  on  their 
applicability  to  testing  computer  programs,  and  to  suggest  an  alternative 
approach . 

As  in  an  earlier  paper  [5] ,  reliability  and  validity  are  defined  in 
terms  of  quite  precise  and  formal  properties  of  computer  test  data 
selection  criteria.  With  these  definitions  a  so-called  "fundamental 
theorem"  was  proved  in  [5]  which  roughly  states: 

K  If  a- test  data  selection  criteria  which  is  valid  and  reliable  can 

i 

,  be  used  to  select  test  data  for  a  given  program,  and  if  the 

program  is  correct  on  the  selected  test  data,  then  the  program  is 
correct  on  any  data. 


(  +  )  School  of  Information  and  Computer  Science.  Georgia  Institute  of 
Technology,  Atlanta.  Georgia  30332 

(*)  Department  of  Computer  Science.  Yale  University.  New  Haven,  Connecticut 
06520 

(I)  See  Goodenough's  paper  for  the  formal  definitions  of  these  and  other 
terras  which  we  will  drawn  on  in  this  discussion. 


59 


Several  comments  are  in  order  on  this  approach.  First,  reliability  and 
validity  are  defined  as  binary  attributes;  that  is,  either  a  test  data 
selection  criteria  has  or  doesn't  have  one  or  both  of  these  properties. 
However,  intuition  says  we  should  expect  that  a  program  test  is  reliable 
and  valid  if  it  is  useful  in  predicting  the  correctness  of  a  program  -  not 
necessarily  ensuring  absolute  correctness  in  the  formal  sense,  but  at  least 
increasing  our  confidence  that  the  program  is  indeed  correct.  Although 
Goodenough  addresses  this  aspect  indirectly  in  a  footnote,  where  the  idea 
of  measuring  a  test  data  selection  criteria's  reliability  and  validity  is 
discussed  in  passing,  there  is  the  danger  that  readers  will  follow  him  and 
not  focus  on  this  issue  which  we  consider  to  be  the  most  important  issue  of 
his  approach  to  using  program  testing  to  ensure  program  correctness. 

Second,  we  feel  that  the  fundamental  theorem  provides  no  useful  information 
or  guidelines  for  anyone  who  has  to  test  real  programs  since  it  is  aimed  at 
showing  absolute  correctness.  What  Goodenough  has  done  is  to  reiterate  the 
conclusion  of  [5]  -  if  you  prove  that  your  testing  criteria  is  perfect  in 
a  fairly  obvious  sense,  then  your  program  is  correct  if  it  passes  the  test. 
Clearly,  this  is  the  expected  deduction.  He  then  says  that  research  in 
this  area  of  program  testing  should  be  directed  toward  finding  reliable  and 
valid  testing  methods,  or  at  least  establishing  how  "close"  the  methods  are 
to  being  reliable  and  valid  so  that  we  can  judge  how  "close"  to  perfect  are 
programs  which  pass  the  test- 

As  an  editorial  note,  while  the  fundamental  theorem  of  [5]  shows  that 
validity  and  reliability  are  sufficient  conditions  for  demonstrating 
program  correctness  by  program  testing,  they  certainly  aren't  necessary 
conditions.  Yet  Goodenough  consistently  says  that  program  tests  must  be 
valid  and  reliable  if  correctness  is  to  be  gotten  from  testing.  Clearly, 


60 


this  is  misleading  and  could  adversly  influence  future  program  testing 
research  efforts. 

Goodenough  ends  on  a  peslmistlc  note  in  stating  that,  from  a 
scientific  point  of  view,  testing  research  can  hardly  be  said  to  be  in  its 
infancy.  He,  as  others  in  the  software  engineering  community,  most  notably 
the  program  verification  school,  continue  to  point  out  that  program  testing 
is  insufficient  to  guarantee  program  correctness.  We  agree.  However, 
since  all  softv.nre  being  used  today,  since  all  software  that  has  ever  been 
developed  to  solve  any  real  problems,  has  been  developed  using  testing,  we 
must  ask  the  following  rather  obvious  question: 

Given  that  program  testing,  while  not  a  perfect  technique,  has 
proved  to  be  a  very  useful  technique,  how  can  we  develop  testing 
methodologies  which  have  less  than  perfection  (absolute  program 
correctness)  as  their  goals  yet  still  yield  substantial  gains? 

It  is  clear  to  us  that  the  future  direction  of  software  engineering  must 
not  turn  Its  back  and  risk  not  developing  this  very  important  research  area 
the  way  It  should  be.  It  is  all  too  easy,  and  wrong,  to  take  the  popular 
viewpoint  that  program  building  is  a  purely  logical  deductive  activity  to 
which  program  testing  is  unsuitable.  Our  viewpoint  is  that  program  design 
and  development  is  an  empirical  engineering  activity  and  when  Goodenough 
says  that  program  testing  is  not  even  in  its  infancy,  we  take  it  to  mean 
that  an  inferential  formalism  for  program  testing  has  not  yet  been 
developed.  However,  it  seems  clear  that  such  a  formalism  is  not  entirely 
necessary  if  one  is  willing  to  accept  that  programming  is  a  human. 

Inductive  activity  which  may  never  be  subject  to  complete  formalism. 

In  the  remainder  of  this  discussion  we  will  overview  an  on  going 
research  effort  which  is  aimed  at  achieving  gains  from  program  testing 


while  not  ensuring  perfection:  namely,  program  mutation  [4).  It  has  been 

observed  [6]  that  the  vast  majority  of  errors  that  remain  in  software,  once 

it  has  been  tested  and  put  into  production,  tend  not  to  be  radical 

( 2 ) 

gjr.ors  but  rather  are  Interacting  combinations  of  simple  errors. 

Indeed,  there  are  many  "horror"  stories  similar  to  the  failure  of  an  early 
Vangard  missile  launch  because  of  a  missing  right  parenthesis  in  a 
controlling  program.  So  a  resonable  goal  of  program  testing  is  to  rule  out 
all  combinations  of  simple  errors:  that  is.  design  a  program  testing 
method  with  the  goal  being  that  if  a  program  passes  the  test  then  either 

(1)  the  program  is  correct,  or 

(2)  the  program  is  radically  incorrect- 

But  even  this  seems  too  ambitious  if  one  attacks  directly.  First,  given  a 
program  we  must  be  able  to  generate  all  of  its  simple  errors.  Assuming 
that  this  can  be  done,  we  next  must  eliminate  the  simple  errors  and  the 
complex  errors  which  eminate  from  their  combinations.  Clearly  the  number 
of  complex  errors  will  be  a  combinatoric  explosion  in  the  number  of  simple 
errors.  While  it  may  be  feasible  to  eliminate  all  simple  errors,  explicit 
elimination  of  all  complex  errors  appears  intractable. 

The  goal  of  the  program  mutation  testing  methodology  is  to  establish 
that  a  given  program  is  either  correct  or  radically  incorrect-  Let  L  be 
the  programming  language  under  consideration.  A  mutant  operator  is  a 
simple  program  transformation,  dependent  on  L,  which  produces  mutant 
programs  of  the  a  given  program  P.  The  mutants  are  also  programs  in  L- 


(2)  There  are  no  agreed  on  technical  definitions  of  errors  categories.  We 
too  will  be  informal-  By  radical  we  mean  errors  due  to  grossly 
misunderstanding  the  program  specifications  -  errors  which  are 
difficult  if  not  impossible  to  capture  by  general  algorithmic  methods 
but  which  would  easily  be  observed  by  almost  any  test  or  when  the 
software  is  first  put  into  production. 


62 


The  goal  of  the  mutant  operator  is  to  introduce  simple  errors  in  P.  thus 
producing  mutants  of  P •  Alternatively,  if  P  is  incorrect  due  to  a  single 
simple  error,  some  mutant  would  be  a  correct  program  for  the  given  task. 
There  should  be  several  mutant  operators,  each  corresponding  to  different 
classes  of  simple  errors  that  may  occur  in  L.  Let  M(P)  denote  the  set  of 
all  mutants  of  P.  Ideally.  M(P1  should  contain  mutants  corresponding  to 
all  and  only  the  possible  simple  errors.  However,  this  is  too  ambitious  a 
goal  for  general  purpose  program  transformations  and  we  relax  the 
requirement  to  be  that  M(P)  covers  all  simple  errors  in  the  sense  that  M(P) 

It 

may  also  contain  mutants  which  are  equivalent  to  P.  We  Jet  M  (P)  denote 
all  the  mutants  of  P  which  come  from  multiple  applications  of  mutant 
operators  on  P.  These  mutants  are  also  programs  in  L- 

Let  D  be  the  input  domain  of  P.  P  is  said  to  pass  the  mutation  test 
with  data  T  if  there  exists  T  a  subset  of  D  such  that 

(1)  P  Is  OK(T),  and 

(2)  for  each  mutant  m  in  M(P)  either 

(a)  m  is  not  OK(T ) ,  or 

(b)  m  is  equivalent  to  P. 

If  P  passes  the  mutant  test  then  we  are  sure  that  P  is  free  of  simple 
errors.  But  what  of  complex  errors?  To  this  end  we  have  observed  a 
coupling  effect  which  states: 

Test  data  T  which  causes  all  the  non-equivalent  mutants  of  M(P) 
to  fall  is  so  sensitive  that  all  the  non-equivalent  mutants  of 
M  (P)  must  also  fall  on  T- 

The  justification  of  the  coupling  effect  parallels  the  probabalistic 
argument  for  justifying  the  single  fault  methods  used  to  test  circuits; 
however,  we  have  no  theory  to  make  it  a  hard-fast  principle-  Basically.  If 


63 


several  simple  errors  (detectable  by  T)  combine  to  make  a  complex  error 
then  it  Is  extremely  unlikely  the  simple  errors  will  cancel  to  allow  the 
successful  execution  on  T  of  the  mutant  containing  the  complex  error.  The 
goal  of  program  mutation  theory  is  then  to  validate,  depending  on  L  either 
deductively  or  experimentally,  the  coupling  effect  for  language  L  by 
establishing  the  following  meta theorem  of  program  mutation: 

If  P  passes  the  mutation  test  then  either 

(1)  P  is  correct,  or 

(2)  P  is  radically  incorrect. 

In  [1]  the  mutation  metatheorem  has  been  formally  shown  to  hold  where  L  is 
certain  classes  of  decision  tables  and  the  mutant  operator  involve  the 
reformulation  of  conditions  and  applied  actions.  Currently,  programs  which 
manipulate  data  structures  are  under  investigation. 

For  general  purpose  programming  languages,  such  as  FORTRAN,  the  task 

is  more  difficult-  There  is  a  noticable  lack  of  empirical  studies  on 

(3) 

programming  errors  to  drawn  on  in  formulating  a  complete  set  of  mutant 
operators  -  a  necessary  requirement  for  program  mutation  to  be  deductive- 
Hence,  at  at  least  for  now,  in  the  case  of  general  purpose  languages  we  can 
consider  program  mutation  as  an  inductive  tool  for  gaining  confidence  that 
the  me ta theorem  of  program  mutation  holds  for  a  particular  program  P.  A 
prototype  system  for  a  subset  of  FORTRAN  has  been  implemented  [2]  and  some 
initial  experience  with  it  and  the  effectiveness  of  the  implemented  mutant 
operators  and  substantiations  of  the  coupling  effect  can  be  found  in  [3]- 
A  mutation  system  for  ANSI  FORTRAN  is  in  the  design  stage.  Several 
experiments  to  finding  "good"  mutant  operators  and  for  evaluating  the 


(3)  Here,  complete  means  that  all  simple  errors  will  be  captured  in  M(P). 


"  -T 


64 


effectiveness  of  mutation  testing  are  under  consideration. 

Some  final  commments  on  performing  the  mutation  test  are  in  order. 
Clearly  the  size  of  M(P)  and  T  must  be  small.  Our  view  is  that  the 
mutation  system  should  be  interactive.  The  user  specifies  the  program  P 
and  initial  test  data  to  the  system  whence  the  mutants  are  generated  and 
executed  on  T^.  A  list  of  mutants  which  fail  and  which  succeed  on  Tj  is 
produced.  The  user  must  then  examine  his  results  to  deride 

(1)  P  contains  a  non-radical  error- 

(2)  Because  mutants  which  should  have  failed  didn't.  Tj  must  be 
augmented  to  T^  and  the  system  re-run. 

(3)  Some  mutants  are  equivalent  to  P-  There  Is  hope  here  that 
symbolic  execution  techniques  can  partially  automate  this  task. 

This  cycle  can  be  viewed  as  a  session  in  which  the  user  defends  P  and  the 
current  test  data  against  a  system  adversary  which  asks  questions  of  the 
form,  "Why  doesn't  your  test  data  distinguish  this  simple  error?"  Such  an 
adversary  forces  the  user  of  program  mutation  into  a  careful  and  detailed 
review  of  his  program  and  the  design  decisions  made  in  constructing  it.  In 
this  view  we  hold  hope  that  even  radical  errors  can  be  uncovered  by  users 
of  program  mutation. 

REFERENCES 

[1]  T-Budd  and  R.Lipton,  "Mutation  Analysis  of  Decision  Table  Programs",  to 
appear  at  the  1978  Conference  on  Information  Sciences  and  Systems, 

Johns  Hopkins  University. 

[ 2  j  T.Budd,  R.DeMillo,  R.Lipton  and  F.Sayward,  "The  Design  of  a  Prototyp 
Mutation  System  for  Program  Testing",  to  appear  at  the  1978  National 
Computer  Conference. 

[3]  R.DeMillo,  R.Lipton  and  F.Sayward,  "Hints  on  Test  Data  Selection",  to 
appear  in  Computer,  April  1978. 


[4]  R-DeMillo,  R.Lipton  and  F.Sayvard.  "PROGRAM  MUTATION:  A  Method  of 
Determining  Teat  Data  Adequacy",  to  appear,  1978. 

(51  J.Goodenough  and  S. Gerhart,  "Toward  a  Theory  of  Testing:  Data  Selection 
Criteria",  IEEE  Trans,  on  Soft-  Eng-  SE-1.2  (June  1975),  pp.  156-173- 

(6]  E. A. Youngs,  "Human  Errors  In  Programming",  International  Journal  of  Man 
Machine  Studies  6  (1974),  pp.  361-376. 


j 


66 


THE  STATUS  OF  RESEARCH  ON  PROGRAM  MUTATION 

(  *  )  i  — ) 

Richard  J.  Li  peon  and  Frederick  0.  Safari' 

December  1978 
.ABSTRACT 

A  status  report  on  two  new  program  nutation  systems  is  given.  The  first  is 
the  EXPER  system  for  casting  programs  written  in  ANSI  FORTRAN  and  for 
experimenting  on  the  concepcs  of  program  mutation.  It  has  been  designed, 
implemented,  and  is  in  its  final  debugging  stages.  The  second  is  the  17MS 
systems  for  testing  programs  -written  in  a  COBOL  subset.  This  system  is  in 
its  final  design  stage. 

.Also,  the  results  of  a  new  experiment  on  substantiating  the  "coupling 
effect"  of  our  FORTRAN  systems  are  presented. 


INTRODUCTION 

Program  mutation  is  a  relatively  new  approach  to  orogram  testing 

which,  unlike  traditional  methods,  attempts  to  exploit  the  fact  that  good 

programmers  write  code  which  is  "close”  to  being  correct.  Traditionally, 

the  fundamental  question  addressed  in  program  testing  has  been: 

Given  that  a  program  Pis  known  to  work  on  test  data  T,  can 
we  conclude  that  P  works  in  general? 

.As  expected,  the  traditional  question  is  theoretically  unanswerable  [5;. 
However,  program  tasting  researchers  have  made  advances  in  providing 
definite  answers  for  special  cases  [6,10]  and,  for  the  general  case,  have 
provided  methods  [3,9,11,12]  for  gaining  confidence  in  a  positi ve  answer. 

Program  mutation,  on  the  ocher  hand,  has  striven  to  answer  a  weaxer 
yet  quite  realistic  question.  The  formulation  of  this  weaker  question  is 

(*)  Department  of  Electrical  Engineering,  University  of  California  at 
Berkeley,  Berkeley,  California  9^720 

Oeoarrment  of  Comouter  Science,  Yale  Universe  c*,  New  Haven,  Connection 
06520 

This  research  was  supoorted  m  part  by  Georgia  Institute  of  Technology 
subcontracts  unaer  the  sponsorship  of  AlRMICS  research  grant  DAAG 
Z9-'3-G—'lZl . 


67 


based  or.  wr.at  we  call 


me  cor.oeien:  nrocrammer  ass'CPti  or. : 


A  competent  programmer, 
deeming  tnat  his  job  of 
cocci  e:e,  has  written  a 
"almost"  correct  in  the 
program  ir.  only  simple 


after  several  iterations  and  on 
uesigning,  cocing,  anc  testing  is 
program  that  is  either  correct  or 
sense  that  it  differs  from  a  correct 
va  ys . 


As  a  simple  example,  suppose  we  want  a  FORTRAN  program  that  computes  tne 
distance  from  the  origin  to  an  N-cimensional  vector  X  where  the  distance  is 
defined  to  be  the  square  root  of  the  sum  of  the  squares  of  the  elements  of 
X.  We  would  accept  the  following  incorrect  program  as  being  written  by  a 


competent  programmer: 

PROGRAM  Pi 
SUM-1 

DO  1  I — 1 ,  K 
SUM-SUM+Xd  )**2 
1  DIST-S QRT(SUM) 

But  we  would  question  the  competence  of  a  programmer  who  produced 

PROGRAM  P2 
DIST-X(l) 

DO  1  1-1, N 

1  dist-hax(X(I),d:st) 

With  the  competent  programmer  assumption,  the  question  addressed  in  program 
mutation  becomes: 

Given  that  Pis  written  by  a  competent  programmer  and  that  ? 
is  known  to  work  on  test  data  T,  can  we  conclude  that  F 
works  in  general? 

Note  that  the  mutation  question  differs  philosophically  from  the 
traditional  testing  questing  in  a  very  important  way:  traditionally,  a 
program  is  treated  as  a  random  object,  whereas  ir.  program  mutation  a 
program  is  assumed  to  be  either  correct  or  almost  correct,  a  mutant  of  a 
correct  program.  Thus  program  PI  above  is  a  mutant  of  the  correct  program 
P  for  the  distance  problem: 


68 


PROGRAM  P 
SUM-O. 0 
DO  1  I-l.N 
1  5UM-SUM+X(I)**2 
DIST-SORT(Sm) 

P2,  on  che  other  hand,  is  not  a  mutant  of  P. 

To  apply  program  mutation,  we  choose  the  method  of  eliminating  the 
alternatives  —  developing  a  test  set  T  on  which  the  program  P  is  correct 
but  on  which  all  mutants  of  P  fail.  In  practice  there  are  far  coo  many 
mutants  of  P  to  consider.  But  by  concentrating  on  the  "first  order" 
mutants  of  P  the  methodology  becomes  tractable.  First  order  mutants  of  ? 
come  from  a  single  application  of  a  mutant  operator,  a  simple  syntactic  o 
semancic  program  transformation  such  as  (1)  changing  a  particular  ihscanc 
of  a  relational  operator  to  one  of  the  five  ocher  relational  operators,  o 
( 2 changing  the  label  part  of  a  particular  GOTO  statement  to  one  of  che 
other  labels  appearing  in  che  program.  We  then  rely  on  the  coupling 
effect : 

Test  data  that  causes  all  first-order  mutants  of  a  program 
to  fail  is  so  sensitive  chat  all  higher-order  mutants  of  che 
program  will  also  fail. 

To  illustrate,  the  following  two  programs  are  first-order  mutants  of 
program  ?  above: 

PROGRAM  Ml  PROGRAM  M2 

SUM-1  SUM-0. 0 

DO  1  I-l.S  DO  1  1-1,51 

1  SUM-SUM+X(I)**2  S1M-SLM+X( I )**2 

DIST-SQRT(SUM)  1  DIST-SQRT(SUM) 

Program  ?1  is  a  mutant  but  not  a  first-order  mutant  of  ?.  3y  the  couplir. 

effect,  if  P  is  correct  on  test  data  T  while  Ml  and  M2  fail,  then  ?1  must 

also  fail  on  T. 

With  this  formulation,  the  effectiveness  of  program  mutation  now 
deoends  on  the  validity  of  :vo  assimotions :  the  ccmretent  orcgranr.er 


assiscptior  anc  the  coupling  effect.  In  practice,  theoretical  studies 
notwithstanding  [2],  it  is  not  necessary  to  show  formally  that  tnese 
assumptions  hold  in  order  for  program  mutation  to  be  a  useful  tool  for 
testing  real  programs  written  in  real  programming  languages.  We  nave  found 
that  in  performing  mutant  tests  on  ar.  incorrect  program  the  user  is  forced 
into  developing  test  data  on  which  the  program  fails  [I],  So  we  are 
interested  in  building  interactive  systems  to  aid  programmers  in  performing 
mutant  tests  and  in  evaluating  the  effectiveness  of  the  approach.  Ue  pick 
a  real  programming  language  L  and,  based  on  the  literature  and  our  personal 
experience,  define  an  appropriate  set  of  mutant  operators  for  L.  Then  we 
build  a  man-machine  mutation  system  that  aids  in  performing  mutant  tests 
for  1  and  the  chosen  mutant  operators.  Using  the  system  and  other  aids,  we 
then  perform  experiments  to  substantiate  the  competent  programmer 
assumption  and  the  coupling  effect  for  L  and  the  given  mutant  operators; 
we  also  check  to  see  how  effective  the  system  is  as  a  testing  tool. 

So  far  we  have  introduced  program  nutation  [£]  and  built  a  pilot 

mutation  system,  called  PIMS,  for  a  subset  of  FORTRAN  [lj.  With  PIXS  we 

wanted  to  gain  some  initial  experience  with  mutations  and  building  mutation 

systems.  The  subset  consisted  of  a  single  FORTRAN  subroutine  with  DC,  IF, 

GOTO,  and  assignment  statements  as  the  control  structures.  Tne  data 

structures  were  integers  with  arrays  of  up  to  two  dimensions.  Hie  mutant 

operators  were  four  classes:  declaration,  data  reference,  operator 

evaluation,  and  control  flow  We  discussed  user  methods  of  determining  the 

correctness  of  a  program  on  test  data,  automatic  detection  of  mutant 

failure,  mutants  equivalent  to  a  giver,  program,  nor-terminating  mutants, 

•> 

and  managing  the  n“  mutants  generated  by  applying  the  mutant  operators,  n 
tne  number  of  executable  statement  in  the  program.  We  have  also  done 


70 


experiments  on  the  competent  programmer  assumption  [1]  and  the  coupling 
ef  feet  [ 5  ] . 

Mow  we  would  like  to  report  on  two  new  program  nutation  systems,  one 
for  nearly  full  ANSI  FORTRAN,  che  other  for  a  COBOL  subset,  and  on  a  new, 
stronger  experiment  for  substantiating  the  coupling  effect  for  our  FORTRAN 
svseems. 

« 

THE  EXPER  SYSTEM 

In  working  with  PTM.S,  we  observed  that  the  test  data  most  programmers 
intuitively  feel  is  good  as  well  as  test  data  generated  by  automatic  means, 
either  randomly  or  by  symbolic  execution,  do  poorly  with  respect  to  the 
mutant  test.  Perhaps  this  gives  evidence  as  to  why  program  testing  has 
traditionally  been,  held  in  such  low  esteem.  Me  admit  that  PTMS  wasn't  very 
flexible  in  its  design  and  consequently  ve  were  able  to  perform  only 
limited  experiments  with  this  system. 

The  ENTER  system  has  as  its  language  ANSI  FORTRAN  minus  1*0  ar.c 
complex  arithmetic.  Its  mutant  operators  are  basically  the  same  as  in 
PTMS.  The  system  was  built  at  Yale  on  che  DEC-21  and  is  nearly  debugged . 
Recently,  it  has  begun  to  be  transported  to  the  VAN  computer  at  L’C 
Berkeley.  Among  the  goals  of  this  system  are  (1)  determining  how  program 
mutation  can  be  integrated  with  the  design,  coding,  and  testing  of 
multi-module  programs,  (2)  determining  whether  the  mutant  operators  of  PEIS 
are  sufficient  for  the  additional  data  and  control  structures  allowea  m 
ENTER,  (3)  further  experiments  on  che  coupling  effect,  ana  (-)  experiments 
on  the  effectiveness  of  the  method. 

Besides  extending  the  language,  there  is  another  major  difference 
between  ?2MS  and  ENTER.  PE!S  was  designed  solely  as  a  user-oner. tec 


71 


s'-’szec.;  the  user  subtrtts  a  program  and  a  set  of  test  data  and  selects 
which  systec-oefined  mutant  operators  are  to  be  applied  to  the  program. 
IHPER,  on  tne  otner  hand,  is  organized  around  the  concept  of  an  experiment 
which  consists  of  a  program,  a  set  of  test  data,  a  subset  of  the  system's 
mutant  operators  which  mav  be  applied  to  the  program,  and  a  further  subset 
of  this  subset  which  wi 11  be  applies  to  the  program.  The  experimenter  is 
easily  able  to  generate  slight  differences  in  each  of  these  elements  and 
then  monitor  the  progress  of  subjects  using  EXPER  to  perform  the  mutant 
test . 

One  current  experiment  involves  the  redundancy  of  mutant  operators. 
Such  redundancy  could  be  counter-productive  if  time  is  spent  constructing 
test  data  that  don't  significantly  increase  one's  confidence  in  the 
correctness  of  a  program.  The  aim  of  this  experiment  is  to  detect 
redundant  mutant  operators  by  statistical  methods.  The  subjects  are 
divided  into  two  groups.  Each  group  is  given  several  programs  and  asked  to 
develop  test  data  by  doing  mutation  analysis.  Some  cf  the  programs  contain 
bugs  which  the  subjects  are  to  try  to  find.  The  difference  between  groups 
is  ir.  what  mutant  operators  they  may  apply  to  the  programs.  Group  1  is 
allowec  to  use  all  implemented  operators  while  group  2  may  use  all  but  the 
operator(s)  in  question.  Tne  variables  to  be  compared  are  the  number  of 
bugs  located  and  the  time  used  in  locating  them. 

Several  other  experiments  are  currently  being  formulated,  such  as 
experiments  to  evaluate  executing  only  some  mutants  versus  executing  all 
mutants  or.  given  test  data.  We  plan  to  report  on  these  experiments  ir.  a 
future  paper. 


THE  C?MS  SYS  TEN 


72 


The  design  of  a  COBOL  pilot  mutation  system,  called  CPMS,  is  . n  ::s 
final  stages  at  Georgia  Tech.  The  COBOL  subset  for  this  project  consists 
of  a  single  COBOL  procedure  with  sentences  of  the  MOVE,  COMPUTE, 

PERFORM,  READ,  'JRITZ  type  as  its  control  structures,  character  and  decimal 
scalar  variables  -with  the  record  feature  as  its  data  structures,  and  uo  to 
two  sequential  input  and  two  sequential  output  files  as  its  I/O  structures. 
The  mutant  operators  of  CI*1S  will  be  similar  to  those  of  PIMS  with  major 
additions  for  data  structures  and  I/O. 

The  CPMS  design  is  based  on  PIMS:  an  interactive  man-machine  system  to 
perform  Che  mutant  test  for  programs  written  in  tne  COBOL  subsec.  C?MS, 
however,  will  -be  more  flexible  chan  PHiS  in  its  experimental  capabili  ti  es . 

.Aside  from  applying  program  mutation  to  a  new  language,  the  major 
issue  addressed  in  CPMS  is  the  I/O  problem,  which  was  avoided  in  PIMS  and 
EXPER.  As  in  FORTRAN,  a  COBOL  mutant  may  fail  in  one  of  three  ways:  it  may 
have  an  execution  fault,  it  may  time  out,  or  it  may  produce  incorrect 
answers.  3ecause  the  oucput  of  an  average  COBOL  program  tends  to  be  much 
larger  than  chat  of  an  average  FORTRAN  program,  it  is  not  clear  whether 
there  is  an  efficient  way  to  check  for  this  third  kind  of  failure.  '.*e 
intend  to  try  Che  following  scheme: 

(1)  A  mutant  fails  if  it  tries  to  *  a  a  longer  record  than  cr.e 
program  read. 

(2)  A  mutant  fails  if  it  reads  fewer  (more)  records  than  the 
program  read. 

(3)  A  mutant  fails  if  it  -writes  fewer  (more)  records  than  the 
program  wrote . 

( ■* )  A  mutant  fails  if  it  produces  files  that  are  unequal  to  the 
files  that  the  program  produced.  Here  the  user  specifies 


73 


wnetner  a  strong  or  a  weak  equality  cneck.  is  to  be  used.  T-'c 
files  are  strongly  equal  if  their  records  match  character  for 
character;  they  are  weakly  equal  if  the  non-blank  characters 
of  their  records  match. 

We  hope  that  the  vast  majority  of  the  COBOL  mutants  will  fail  before  step  L 
is  involved.  Of  course,  we  are  not  certain  whether  a  mutant  that  fails  or, 
some  steps  of  the  scheme  should  not  be  allowed  to  continue  anyway.  Part  of 
the  COBOL  nutation  project  will  be  experimenting  to  find  a  realistic 
definition  of  mutant  failure  on  I/O. 

STRONGER  SUBSTANTIATION  OF  THE  COUPLING  EFFECT 

W'e  have  already  reported  or.  an  experiment  [5]  involving  Hoare's  PINT 
program  [7]  that  supplied  empirical  evidence  for  the  coupling  effect.  The 
experiment  went  as  follows: 

( 1 )  Ue  derived  a  test  data  set  T  of  cases  to  pass  the  mutant 
test.  (The  large  size  of  T  was  due  to  our  inexperience.) 

(2)  For  efficiency  reasons,  we  reduced  T  heuri stically  to  a  test 
data  set  7'  consisting  of  sever,  cases  on  which  FINE  alsc 
passed  the  mutant  test. 

(3)  Random  k-order  mutants  of  FIND,  k>l,  were  generated.  ;A 
k-order  mutant  comes  from  k  applications  of  mutant  operators 
on  the  program  P.) 

(L)  The  k— order  mutants  of  FIND  were  then  executed  on  7'. 

The  coupling  effect  says  that  the  non-equivalent  k-order  mutants  of  FIND 
vwll  oail  on  T  .  Note  that  step  2  biases  the  experiment  against  the 
coup.ing  efxect  since  it  removes  the  man-machine  orientation  of  our 
approach  to  testing.  We  would  nave  been  quite  happv  to  find  a 


74 


counterexample  to  Che  coupling  effect  for  PIMS,  since  i:  would  have  a.lowea 
us  to  improve  Che  sec  of  aucant  operators.  The  results  of  the  exoercmer.t, 
chough,  gave  evidence  chac  we  had  chosen  a  well  coupled  sec  of  mean: 
operators  for  PIMS: 

'<  Number  of  k— o rder  mutants  Number  successful  on  T ' 

2  21100  19 

>2  1500  0 

The  19  successful  mutants  -ere  shown  to  be  equivalent  to  FIND.  Ve 

concentrated  on  the  k*2  case  since,  intuitively,  the  more  one  mutates  FIND 

the  more  likely  one  is  to  get  a  program  that  violates  the  competent 

programmer  assumption. 

The  major  criticism  of  the  experiment  concerns  step  3.  Since  the 
first-order  mutants  that  compose  the  k-order  mutants  are  inde  pen  candy 
drawn,. the  resulting  k-order  mutant  is  likely  to  be  very  unstable  and 
subject  to  quick  failure,  in  contrast  to  the  more  desirable  case  were  the 
k-order  mutant  contains  subtly  related  changes  that  correspond  to  the 
subtle  errors  programmers  find  so  hard  to  detect. 

The  current  experiment  on  the  coupling  effect,  which  uses  ZXPER  rather 
chan  PIMS,  omits  step  2  above  and  makes  the  following  imoortanc  change  co 
step  3: 

(3)  Randomly  generate  correlated  k-order  mutants  of  the  program. 
By  correlated  we  mean  that  each  of  the  k  applications  of  mutant  operators 
will  in  some  way  be  related  to  all  of  che  others  —  thev  could  for  instance 
effect  the  same  statement  of  ?,  or  the  same  variable  name,  or  the  same 
statement  label,  or  the  same  constant.  Once  again,  if  ?  passes  the  nut3nt 
test  with  test  data  T,  the  couoling  effect  says  that  the  correlated  --.-orter 
mutants  of  ?  will  fail  on  T. 

For  this  experiment  three  programs  are  being  usee:  'RID,  :~3T!,  arc 


75 


TRlANG.  STKSIM  is  a  program  that  maintains  a  stack  and  allows  tne  starcarc 
operations  of  clear,  push,  pop,  and  top.  TRlANG  is  a  program  char,  giver. 

: ne  lengths  of  the  three  legs  of  a  triangle,  categorizes  the  input  as  net 


representing 

a  triangle  or 

as  r 

epresenting  a  scalene 

,  isosoleses  or 

equilateral  t 

nangle  [3j. 

The 

following  is  a  sumtar 

v  of  the  results  i 

experiment  so 

far: 

PROGRAM 

k-2 

k-3 

k-4 

number  successes 

number  successes 

number  successes 

FINE 

3000  2 

3000  0 

3000  0 

STKSIM 

3000  3 

3000  0 

300 0  C 

TRLANG 

3000  1 

3000  1 

3000  0 

In  all  cases,  the  successful  correlated  k-order  mutants  have  been  shown  to 

be  equivalent  tc  the  original  program.  The  detailed  results  of  the 

experiment  on  TP.IANG  are  listed  in  the  appendix. 

kie  have  vet  to  find  a  non-trivial  counterexample  to  the  coupling 

effect  for  our  FORTRAN  systems.  The  one  sutcessful  3-order  mutant  of 

TRLANG  deserves  closer  examination;  indeed,  we  initially  felt  that  it  was  a 

non-eauival ent  mutant.  The  mutant  is 

SUBROUTINE  TP.lANGd, J,K, HATCH) 

C 

INTEGER  I, J , K, MATCH 
C 

C  MATCH  IS  OUTPUT  FROM  THE  ROITINE: 

C  MATCH  -  j  IF  TRIANGLE  IS  SCALENE 

C  MATCH  -  2  IF  TRIANGLE  IS  ISOSCELES 

C  MATCH  -  3  IF  TRIANGLE  IS  EQUILATERAL 

C  MATCH  -  4  IF  NOT  TRIANGLE 

C 

1  2  Tr  (I.LE.O.OR.J.LE.O.OR.K.LE.O)  GOTO  500 

3  MATCH-0 

-  5  IF  (I.NE.J)  GOTO  10 

6  MATCH-MATCH^-] 

16  10  IF(I.NE.K)  GOTO  20 

9  MATCH-MATC1H-2 

MO,:  change  statement  9  to  MATCH-MATCH-^ 


2C  IF(J.NE.K)  GOTO  30 
MATCH-MATCH-3 


76 


30  I?( MATCH. NE.O)  GOTO  100 

TF(I*J.LE.K)  GOTO  500 
GOTO  500 
IF (  I-HC.  LE.J)  GOTO  500 
MATCH- 1 
RETURN 

100  IF (MATCH. NE. 1 )  GOTO  20C 
IF( I— J. LE.  K)  GOTO  500 
110  MATCH-2 
RETURN 

200  IF(MATCH.NE.2)  GOTO  300 
MC-:  change  statement  29  Co  EF( MATCH.  NE.K) 


IF( I-K.LE.J)  GOTO  500 
GOTO  110 

300  IF( MATCH. NE . 3 )  GOTO  400 
IF(J-K.LZ.I)  GOTO  500 

:har.ge  staraaen:  36  to  TF(  J*J  .  LZ .  I ) 


GOTO  110 
39  *C0  MATCH- 3 

4C  RETURN 

•*1  500  MATCH-4 

42  RETURN 

END 

Note  chat  Che  correlation  is  with  respect  to  variable  X.  The  mutant 
operators  MO.  and  MOj  produce  incorrect  mutants  ->iiie  MO,  produces  a  mute 
equivalent  to  TRIANG.  Yet  the  3-order  correlated  mutant  is  equivalent  to 
TRIANG. 

This  makes  a  beautiful  illustration  of  the  part  of  the  programming 
process  chat  program  mutation  is  trying  to  exploit.  Using  the  constant  2 
in  statements  9  and  29  is  an  arbitrary  but  coupled  decision.  Ir.ceec,  you 
tar.  replace  both  instances  of  2  by  any  posicive  constant  (or  snv  variable 
•  rose  value  doesn't  c.oange  betveen  the  execution  of  statements  ?  anc  2 3 
ar.c  vou  *et  an  equivalent  program  —  replace  only  one  instance  ar.c  vou  ce 
at.  incorrect  program.  In  a  sense,  the  constant  1  in  statements  3  anc  29 
-~at  vo'iid  be  called  in  the  teninclcey  of  formal  loeic  a  "bcunc 


77 


.  c. 


A  CTf:  own  D  CEMENTS 

We  wish  to  thank  Alan  Acree,  llr  Buoc,  Jit  Burns,  Pu.cn  DeMllo,  Idle 
'‘.artm,  and  Dar.  St.  Anare  for  their  contributions  to  the  program  mutation 
effort  and  also  to  thank  Marv-Claire  Van  Leuner.  for  the  careful  editing  and 
stvlistic  mutations  she  has  made  to  our  initial  draft. 


I 

I 

I 

I 


78 


?:-.fzrz\’CES 

’  7.  A. Sudd,  R.  A.OeMi  llo  ,  R.J.Lipcon,  ana  r  •  C.  Savva  rd  ,  "The  Design  of  a 
Prototype  Mutation  System  for  Program  Testing",  Proceedings  of  the 
1 8  National  Computer  Conference,  pp.  o23-oC7. 

;2(  T.A.3udd  and  R.J.Lipton,  "Mutation  Analysis  of  Decision  Table 

Programs",  Proceedings  of  Che  1973  Conference  on  Information  Sciences 
and  Systems,  pp.  346-349. 

[3;  L. A. Clarke  and  J.L.'Joods,  "Program  Testing  Using  Symbolic  execution", 
presented  at  the  Navy  Laboratory  Computing  Committee  Symposium  on 
Software  Specification  and  Testing  Technology,  April  1973. 

[4]  R.A.DeMiiio,  R.J.Lipton,  and  F.G.Sayward,  "PROGRAM  MUTATION:  A  New 
Approach  to  Program  Testing",  presented  at  the  Navy  Laboratory 
Computing  Committee  Symposium  on  Software  Specification  ,mc  ~esting 
Technology,  April  1978. 

[5]  R.A.DeMiiio,  R.J.Lipton,  and  F  .G.  Sayward  "Hints  on  Test  Data 

Selection:  Help  for  the  Practicing  Programmer",  Computer  11, a  (April 

197®),  pp.  34-4 1 . 

[5;  M.Gelier,  "Test  Daca  as  an  Aid  in  Proving  Program  Correctness",  in 
Proc.  of  the  Third  ACM  Symp.  on  the  Principles  of  Programming 
Languages  (1975),  pp. 209-218. 

( ~  (  C .  A.  Hoare ,  "Algorithm  65:  FIND",  Comm,  of  the  ACM  - ,  1  (April  1961', 
oo.  321. 

(o'  V .  Z.  Howden,  "Reliability  or  the  Path  .Analysis  Testing  Strategy"  ,  IZZZ 
Trans,  on  Soft.  Zng.  SE-2,3  (Sept.  1976),  pp.  208-21-. 

(9(  V . Z . Ho ud en ,  "Methodology  for  the  Generation  of  Program  Test  Data", 

IZZZ  Trans .  on  Computers  C-24, 5  (May  1975),  pp.55--o 60. 

(1I(  A. Lew  and  D.Tamanaha,  "Decision  Table  Programming  and  Reliability",  . - 
Proc.  of  the  Second  International  Coni,  on  Reliable  Software  \1?"6), 
pp.  345-349. 

'!’.(  L. J.Osterweil  and  L.D.Fosdick,  "Some  experiences  with  DA”  -  A  Fortran 
Program  .Analyzer",  AFTPS  Conference  Proceedings  -5  (  1  976  ), 

??.  909-91 5.  ~  '  '  '  .  ~~  " 

(12(  C . 7. Ramamoorthy ,  S.F.Ho,  and  W.T.Chen,  "Cn  the  Automated  Generation  of 
Program  Test  Data",  IZZZ  Trans,  on  Soft.  Zne.  3Z-2. -  (Dec.  1?"5', 

00. 293-300. 


r 


79 


APPENDIX 


This  appendix  Uses  the  output  generated  by  ZXPER  and  an  associated 
experimental  subsystem  for  performing  the  correlated  R-order  mutation 
experiment  on  program  TRIANG. 


LISTING  OF  THE  PR OGHAM  BEING  MUTATED 


SUBROUTINE  TRIANG(  I,  J,  K, MATCH) 


IF<  I  .LE.  0  . 

OR.  J  .12 

E.  0  .OR.  K  .LE.  0)  GOTO  500 

1 

<7 

MATCH  -  0 

3 

ire  .ne.  j) 

GOTO  10 

u 

2 

MATCH  -  MATCH 

,  +  1 

I  0 

17(1  .ME.  K) 

GOTO  20 

"7 

3 

MATCH  -  MATCH 

:  +  2 

9 

•*  n 

rr(j  .ne.  k) 

GOTO  30 

1  ~ 

i  1 

MATCH  -  MATCH 

+  3 

y  •» 

30 

IP  (MATCH  .NE. 

0)  GOTO 

100 

13 

1  ■* 

17  ( I  •+■  J  .  LE . 

K)  GOTO 

500 

15 

16 

IF( J  +  K  .LE. 

I)  GOTO 

500 

17 

IS 

3(1+1  .  LE. 

J)  GOTO 

500 

19 

20 

MATCH  -  1 

RETURN 

*7  *7 

ICC 

IP ( MATCH  .NE. 

1)  GOTO 

200 

23 

—  ** 

17(1  -  J  .LE. 

K)  GOTO 

500 

23 

2  6 

:  ic 

MATCH  -  2 

^  — 

RETURN 

2  ° 

TOO 

17 (MATCH  .NE. 

2)  GOTO 

300 

29 

3  0 

3(1  K  .LE. 

J)  GOTO 

500 

3  i 

22 

'GOTO  110 

*5  "5 

300 

3 (MATCH  .NE. 

3)  GOTO 

400 

2- 

35 

I7( J  +  K  .LE. 

I)  GOTO 

500 

36 

3T 

GOTO  110 

38 

<*00 

MATCH  -  3 

39 

RETURN 

-*C 

500 

MATCH  -  4 

-*  4 

RETURN 

*2 

END 

80 


X  OF 

THE  TEST  CASES 

ON 

WHICH 

TR LANG  PASSES 

THE  Tl'T AN 

ase 

NUMBER  LZ 

1 

G 

.ZNGTHS 

:< 

TR  TANGLE 
TRIANGLE 
ISOSCLZS 

\ 

0 

0 

0 

N 

<■» 

3 

4a 

8 

V 

3 

1 

1 

1 

z. 

a 

1 

3 

"1 

- 

5 

3 

4 

6 

s 

6 

8 

4 

3 

N 

7 

3 

8 

4 

N 

8 

2 

2 

5 

N 

9 

2 

5 

3 

N 

10 

2 

3 

3 

» 

T 

11 

5 

2 

2 

N 

12 

0 

i 

1 

N 

1 3, 

1 

0 

1 

N 

14 

1 

1 

0 

N 

15 

5 

9 

9 

T 

16 

9 

5 

9 

17 

9 

9 

5 

* 

18 

-1 

1 

1 

.>1 

19 

1 

-1 

N 

20 

1 

1 

-1 

N 

21 

4 

5 

9 

N 

22 

9 

4 

5 

N 

3  3 

*- 

4 

9 

5 

N 

24 

4 

4 

8 

N 

25 

4 

3 

4 

V’ 

26 

3 

4 

4 

'I 

27 

3 

5 

5 

s 

23 

5 

9 

6 

5 

29 

5 

-> 

6 

S 

30 

3 

9 

5 

N 

2 1 

3 

9 

/ 

3 

3  2 

10 

10 

13 

T 

33 

10 

13 

10 

'34 

13 

10 

10 

35 

6 

7 

4. 

S 

36 

7 

3 

6 

5 

-37 

10 

5 

5 

5 

i*i  o 


81 


FINAL  STATISTICS  FOR  PASSING  THE  MUTANT  TEST  ON  TR LANG 


“mutant  state  for  all  PROGRAM  UNITS 


FOR  EXPERIMENT  "EIGHT. EXP  " 

'  THIS 

IS  RUN  3 

NUM3ER 

OF 

TEST  CASES  -  37 

— 

NUMBER 

OF 

MUTANTS  -  1026 

NUM3ER 

OF 

DEAD  MUTANTS  - 

95  5  ( 

93.12) 

NUMBER 

OF 

LIVE  MUTANTS  - 

0  ( 

0.02) 

— 

NUMBER 

OF 

EQUIV  MUTANTS  - 

71 

(  6.92) 

NUM3ER 

OF 

MUTATABLE  STATEMENTS  - 

L2 

GIVING 

A 

MUTANTS / S TATEMENT 

RATIO 

OF  2i.i3 

MUTANT  ELIMINATION  PROFILE  FOR  ALL  PROGRAMS  ! 


i 


MUTANT  TYPE 

TOTAL 

DEAD 

LIVE 

ECU  I1/ 

' 

—  CONSTANT  REPLACEMENT 

30 

30 

100.02 

0 

0.02 

0 

C  0*'  1 

w  •  *•  j 

SCALAR  VARIABLE  REPLACEME 

126 

120 

95.22 

0 

0.02 

6 

4.8^  \ 

SCAUR  FOR  CONSTANT  REP. 

60 

60 

100.02 

0 

C.  02 

0 

P  *'  1 

'Jm  j 

—  CONSTANT  FOR  SCAUR  REP. 

170 

168 

98.82 

0 

0.02 

2 

,  -  4 

SOURCE  CONSTANT  REPLACEME 

36 

36 

100.02 

C 

0.02 

0 

-  «/ 

•  VAl 

UNLURT  OPERATOR  INSERION 

205 

U9 

72. 72 

0 

0.02 

56 

27.22  1 

ARITlfrlETTC  OPERATOR  repu 

63 

6 1 

96.82 

0 

0.02 

i 

3.22  t 

i 

REUTIONAL  OPERATOR  REPU 

80 

76 

95.02 

0 

0.02 

u 

LOGICAL  CONNECTOR  REPUCE 

6 

6 

100.02 

0 

C.C2 

0 

^  ^ :  ■ 

_  STATEMENT  ANALYSIS 

*2 

■A2 

100.02 

0 

0.  02 

c 

STATEMENT  DELETION 

-»2 

*2 

100.02 

n 

j  •  U  /• 

o.  ■ 

RETURN  STATEMENT  REPUCEM 

38 

37 

97.  42 

0 

n  •» 

•J  •  /• 

1 

2 .  o  2 

^  GOTO  STATEMENT  REPUCEMEN 

128 

123 

100.02 

0 

0.02 

0 

0.22 

82 


RESULTS  FOR  THE  GENERATION  OF  2 -ORDER  MLTAI.'TS  OF  TR  LANG 

*»  xx*Kr*mrt*x**'*<’*KK**x*****y:**^********jt*********it*»n*:**>r****>r*****>mK 
x  *  *  x  «  *  +  *+■*+- *x*icifX**:***wKr 

THE  FOLLOWING  2-ORDER  MUTANT  OF  TRIANG  SUCCEEDED 

MUTANT  PHYSICAL  RECORD  IS  428 
STATEMENT  1  CHANGED  FROi 

IF C I  .11.  0  .OR.  J  .LE.  0  .OR.  K  .LE.  0)  GOTO  500 

— r~ 

LF(  I  .  LE.  -0  .OR.  J  .LE.  0  .OR.  K  .LE.  0)  GOTO  500 

MUTANT  PHYSICAL  RECORD  IS  726 
STATEMENT  13  CHANGED  FROM 
3  2  IF ( MATCH  .NT.  0)  GOTO  100 

*'«r\ 

30  IF (MATCH  .GT.  0)  GOTO  100 

r.  x*x***irx'V:*:********'*'**x-*'************jt:**ticrt,*'***,*rt**-**'**x***i****<fc****jr*ir* 
i»'*XKxir*x*x>r*K***x*****ir******^***it»*^**t**’*****^i»****ilr***t******Jf***** 


WITH  THE  ORDER  AT  2 

THE  NUMBER  OF  CORREUTED  MUTANTS  OF  TRIANG  DRAWN  WAS  3000 
OF  THOSE  THE  NUMBER  OF  LIVE  DRAWS  WERE  1 


PROFILE  OF  EQUIVALENT  COMPONENTS 
NO.  EQU  MTS  NUMBER  DRAWN  NO.  SUCCESSFUL 

0  2819  0 

1  180  0 

2  1  1 


PROFILE  ON  METHOD  OF  2-ORDER  MUTANT  FAILURE 

25:6  TERMINATED  BUT  PRODUCED  WRONG  ANSWERS 
0  HAD  AN  ARITHMETIC  FAULT 
0  HAD  AN  ARRAY  INDEXING  ERROR 
0  EXECUTED  A  TRAP  STATEMENT 
13F  REFERENCED  AN  UNDEFINED  VARIABLE 
C  ATTEMPTED  TO  DIVIDE  BY  ZERO 

exceeded  the  time  limit 

:  ATTEMPTED  ILLEGAL  DATA  COERSION 
191  ATTEMPTED  TO  ALTER  A  READ  ONLY  VARIABLE 


83 


RESULTS  FOR  THE  GENERATION  OF  3-ORDER  MUTANTS  OF  TR  LANG 


k *** * * 
****** 


*** ** ******  ******** **** **************  **»*»*«  »**»***»  *»**«»* ***** 
A**************** ********************************* **********  **** 


THE  FOLLOWING  3-ORDER  MUTANT  OF  TRIANG  SUCCEEDED 

MUTANT  PHYSICAL  RECORD  IS  204 
STATEMENT  29  CHANGED  FROM 

200  17 (MATCH  .NE.  2)  GOTO  300 

TO 

290  IF (MATCH  .NE.  K)  GOTO  300 

MUTANT  PHYSICAL  RECORD  IS  147 
STATEMENT  36  CHANGED  FROM 
EF(J  f  K  .LE.  I)  GOTO  500 
TO 

LF( J  +  J  .LE.  I)  GOTO  500 

MUTANT  PHYSICAL  RECORD  IS  180 
STATEMENT  9  CHANGED  FROM 
MATCH  -  MATCH  +  2 
TO 

MATCH  -  MATCH  K 


X ************************* ********************************** ********** 
************************************************************************* 


WITH  THE  ORDER  AT  3 

THE  NUMBER  OF  CORRELATED  MUTANTS  OF  TRIANG  DRAWN  WAS  3000 
OF  THOSE  THE  NUMBER  OF  LIVE  DRAWS  WERE  1 


PROFILE  OF  EQUIVALENT  CCMPONENTS 
NO.  EQU  MTS  NUMBER  DRAWN  NO.  SUCCESSFUL 

0  2743  3 

1  249  1 

2  8  0 

3  0  0 


PROFILE  ON  METHOD  OF  3-ORDER  MUTANT  FAILURE 

2-1?  TERMINATED  BUT  PRODUCED  WRONG  ANSWERS 
9  HAD  AN  ARITHMETIC  FAULT 
0  HAD  AN  ARRAY  INDEXING  ERROR 
0  EXECUTED  A  TRAP  STATEMENT 
2  92  REFERENCED  AN  UNDEFINED  VARIABLE 
0  ATTEMPTED  TO  DIVIDE  BY  ZERO 
57  EXCEEDED  THE  TIME  LSI  IT 
0  ATTEMPTED  ILLEGAL  DATA  COERSION 
322  ATTEMPTED  TO  ALTER  A  READ  ONLY  VARIABLE 


84 


RESULTS  FOR  THE  GENERATION  OF  ^ -ORDER  MUTANTS  OF  TRIANG 


WITH  Tru.  ORDER  AT  ** 

THE  NTM5ER  OF  CORRELATED  MUTANTS  OF  TRIANG  DRAWN  WAS  300C 
OF  THOSE  THE  NUMBER  OF  LIVE  DRAWS  WERE  0 


u 

h 


PROFILE 
NC.  EOU  MTS 


OF  equivalent  components 

NUMBER  DRAWN  NO.  SUCCESSFUL 


0  26L4  0 

1  338  0 

2  18  0 

3  C  0 

0  0 


r 


PROFILE  ON  METHOD  OF  4 -ORDER  MUTANT  FAILURE  f 

t 

I30S  TERMINATED  BUT  PRODUCED  WRONG  ANSWERS 

v  HAD  AN  ARITHMETIC  FAULT  f 

0  HAD  AN  ARRAY  INDEXING  ERROR  ] 

C  EXECUTED  A  TRAP  STATEMENT 
27w  REFERENCED  AN  UNDEFINED  VARIABLE 
0  ATTEMPTED  TO  DIVIDE  BY  ZERO 
13  EXCEEDED. THE  TIME  LIMIT 
C  ATTEMPTED  ILIEGAL  DATA  COERSION 

«C5  ATTEMPTED  TO  ALTER  A  READ  ONLY  VARIABLE  r 


r 


i 


♦ 


4 


1978  ASQC  TECHNICAL  CONFERENCE  TRANSACTIONS- CHICAGO 


PROGRAM  MUTATION  AS  A  TOOL  FOR  MANAGING 
LARGE-SCALE  SOFTWARE  DEVELOPMENT 

Richard  DeMillo 

School  of  Information  and  Computer  Science 
Georgia  Institute  of  Technology 
Atlanta,  Georgia  30332 

Richard  Lipton  and  Frederick  Sayvard 
Department  of  Computer  Science 
Yale  University 
New  Haven,  Connecticut  06520 


INTRODUCTION 

Several  approaches  to  aid  in  the  design,  implementation  and  debugging  of  large- 
scale  software  have  recently  emerged.  Examples  are  restricted  modularization  (14), 
structured  programming  (14),  and  program  verification  (9,10).  However  helpful  they 
may  be  to  programmers  and  low-level  managers,  the  effects  of  these  techniques  cannot  be 
utilized  throughout  a  software  project  management  hierarchy  since  they  are  qualitative 
rather  than  quantitative:  managers  should  not  be  expected  to  understand  code  and/or 
sophisticated  mathematics. 

In  this  paper  we  explain  how  an  important  phase  of  software  development,  testing, 
can  be  managed  effectively  by  use  of  the  new  program  testing  approach  known  as  prograr 
mutation  (15)  .  Program  mutation  provides  as  a  side  effect  the  qualitative  type  of  in¬ 
formation  that  managers  need  to  monitor  software  development  and  personnel  performance- 
The  basic  idea  is:  given  a  program  module  and  its  test  data,  program  mutation  provides 
a  measure,  in  terms  of  a  percentage,  of  how  "well"  the  data  actually  tests  the  module. 
The  higher  the  percentage,  the  more  adequately  the  program  has  been  tested.  A  program 
mutation  system  produces  the  percentage  and  users  increase  the  measure  by  either  aug¬ 
menting  the  data  in  a  controlled  fashion  or  by  answering  "hard"  questions  about  the 
module  which  are  posed  by  the  system.  This  process  iterates  until  a  satisfactory 
testing  percentage  is  obtained.  Meanwhile,  the  program  mutation  system  records  all 
the  involved  information  in  a  data  base  which  can  be  querried  at  any  time  by  members 
at  all  levels  of  the  project  hierarchy  to  obtain  reports  containing  relevant  informa¬ 
tion  on  the  project's  testing  status.  For  example,  the  project  manager  may  wish  to 
know  only  the  testing  percentages  of  all  program  modules  while  a  programmer  may  wish 
to  review  in  detail  some  or  all  of  the  questions  and  answers  previously  recorded  for  a 
given  module. 

In  section  2  we  detail  the  theory  of  program  mutation  as  a  program  testing  tool . 
Section  3  explains  what  types  of  information  various  members  of  the  project  hierarchy 
would  draw  from  the  mutation  system  and  how  that  information  would  be  used  as  a  manage¬ 
ment  tool.  These  concepts  are  illustrated  in  terms  of  a  hypothetical  compiler  con¬ 
struction  project.  Finally,  in  section  4  we  present  another  application  of  program 
mutation:  monitoring  software  procurment. 

THE  PROGRAM  MUTATION  METHODOLOGY 

Program  testing  is  an  inductive  science  which  addresses  the  following  fundamental 
question : 


"If  a  program  is  correct  on  a  finite  number  of  test  cases, 
is  it  correct  in  general?" 

Finite  test  data  which  implies  general  correctness  is  called  adequate  teat  data  and 
since  adequate  test  data  cannot  in  general  be  derived  algorithmically  (4)  program 
testing  cannot  be  deductive.  Recently,  path  analysis  (1,2, 5,6)  and  symbolic  execution 
(7,6)  have  emerged  as  methods  which  allow  one  to  gain  confidence  in  one's  test  data's 
adequacy.  Although  as  with  any  inductive  science,  it  is  possible  to  make  false  infer¬ 
ences  with  path  analysis  (3)  ,  the  basic  idea  is  undeniable:  test  data  which  exercises 
all  flowchart  control  paths  of  a  program  at  least  once  must  be  better  than  test  data 
which  doesn't.  Symbolic  execution  is  associated  to  path  analysis  since,  among  other 
things,  it  attempts  to  derive  test  data  which  exercises  all  paths  of  a  program. 


Unlike  previous  software  reliability  methods,  in  program  mutation  we  make  the 


1978  ASQC  TECHNICAL  CONFERENCE  TRANSACTIONS-CHICAGO 


following  assumption: 

Experienced  prograrmere  write  programs  which  are  either  correct 
or  are  "almost"  correct. 

That  ia,  in  the  mutation  terminology. 

If  a  program  ia  not  correct,  then  it  is  a  "mutant"  -  it  differs 
from  a  correct  program  by  simple  well -understood  errors. 

There  is  empirical  evidence  which  supports  this  natural  premise  (11) . 

Boehm  has  found  (12)  that  errors  fall  into  three  categories:  clerical,  logical, 
and  misunderstanding  of  specifications.  In  the  above  assumption  we  do  not  explicitly 
mention  errors  due  to  programmers  misunderstanding  specifications;  rather,  it  appears 
we  are  dealing  exclusively  with  clerical  errors.  While  a  system  which  would  solve  the 
clerical  error  problem  would  be  quite  useful,  program  mutation  does  even  more:  indeed, 
below  we  explain  how  the  use  of  the  program  mutation  methodology  can  lead  to  the  detec¬ 
tion  of  all  three  error  types. 

With  the.  "experienced  programmer  assumption",  the  mutation  method  is:  taXe  a  pro¬ 
gram  P  which  is  correct  on  some  test  data  T  and  subject  it  to  a  series  of  mutant  opera¬ 
tor,  thereby  producing  mutant  programs  which  differ  from  P  in  very  simple  ways.  For 
example,  if 

I  -  1+1 

is  a  statement  in  P,  then 

I  -  1-1 
I  -  1+2 

I  -  1+0  (i.e-,  a  no-op) 

are  all  simple  changes  which  lead  to  three  mutants  of  P.  The  mutant  programs  are  then 
executed  on  T.  If  all  mutants  give  incorrect  results  then  it  is  very  likely  that  p  is 
correct  (i.e.,  we  can  infer  with  high  confidence  that  T  is  adequate).  On  the  other 
hand,  if  some  mutants  are  correct  on  T  then  we  can  infer  that  either: 

(1)  The  mutants  are  equivalent  to  P, 

(2)  The  test  data  T  is  inadequate,  or 

(3)  The  program  P  is  incorrect. 

If  it  cannot  be  determined  that  P  is  incorrect  from  this  information,  then  T  must  be 
augmented  and  the  mutation  method  re-applied  in  an  attempt  to  make  the  non-equivalent 
mutants  which  are  correct  on  T  subsequently  fail.  This  augmentation  process  forces 
the  programmer  to  examine  P  in  detail  with  respect  to  the  mutants. 

At  first  glance  it  would  appear  that  if  T  is  determined  adequate  by  mutation  anal¬ 
ysis,  then  P  might  still  contain  some  complex  errors  which  are  not  explicitly  mutants 
of  P.  To  this  end  there  is  a  coupling  effect  which  states: 

Test  data  on  which  all  simple  mutants  fail  is  so  sensitive  that  it  is 
highly  likely  that  all  complex  mutants  must  also  fail. 

That  is,  if  a  program  passes  tests  for  all  possible  simple  errors  then  it  has  been 
implicitly  tested  for  all  possible  complex  errors.  It  is  in  this  effect  that  the 
power  of  program  mutation  to  detect  the  so-called  logical  errors  of  Boehm  (12)  is 
revealed.  Experiments  which  substantiate  the  coupling  effect  are  reported  in  (13). 

Using  program  mutation  as  a  tool  for  obtaining  reliable  software  is  a  highly 
interactive  process  whose  success  depends  in  part  on  human  judgement.  Due  to  the 
following  issues ,  the  programmer  must  re-examine  in  critical  detail  both  his  program 
and  its  specifications  and  why  he  made  the  decisions  that  led  to  the  construction  of 
his  program.  The  crucial  issues  which  must  be  addressed  by  the  users  include: 

(1)  Which  mutant  operators  should  be  applied  to  the  program? 

(2)  Are  the  program  and  Its  mutants  correct  on  the  given  test  data? 

(3)  Is  a  given  mutant  equivalent  to  the  program? 

It  is  here  that  specifications  errors  are  discovered.  Note  that  it  is  possible  for  a 


87 


1878  ASOC  TECHNICAL  CONFERENCE  TRANSACTIONS-CHICAGO 


mutation  system  to  provide  the  users  with  information  which  greatly  facilitates  resol¬ 
ving  these  issues:  indeed,  s  mutation  system  can  L*v<*n  resolve  them  automatically  in 

some  cases. 


In  using  a  program  mutation  system,  a  programmer  specifies  to  the  system  his  pro¬ 
gram,  test  data,  and  the  mutant  operators  he  wishes  to  be  applied.  The  system  then 
generates  and  executes  the  mutants  on  the  test  datu  and  produces  a  report  indicating 
which  mutants  are  correct  on  the  given  test  data.  The  determination  of  mutant  correct¬ 
ness  is  done  in  one  of  two  ways:  (1)  by  direct  comparison  of  the  mutant  output  with 
the  program's  output,  or  (2)  by  a  user-supplied  algorithm  which  examines  the  output  of 
the  mutant.  In  both  cases  the  system  asks  the  user  whether  or  not  the  program  is 
acceptable  on  the  test  data.  However,  determination  of  mutant  failure  is  done  by  the 
system. 

Upon  examining  the  report,  the  user  may  re-run  the  system  and  augment  his  test 
data  in  an  attempt  to  make  the  remaining  mutants  fail.  He  may  aleo  specify  that  addi¬ 
tional  mutant  operators  be  applied  to  the  program.  The  system  produces  another  report 
of  the  same  nature  as  the  first  for  the  user  to  examine.  Thi6  cycle  continues  until 
the  user  is  satisfied  that  his  current  test  data  adequately  tests  his  program. 

MANAGEMENT  ASPECTS  OF  PROGRAM  MUTATION 

Successful  large-scale  programming  projects  rely  on  a  hierarchical  flow  of  infor¬ 
mation  and  decisions.  A  fragment  of  such  a  project  structuring  is  represented  in 
figure  1.  In  addition,  there  is  a  recognizable  time-ordering  of  events  for  gathering 


PROJECT  MANAGER 


CHIEF  PROGRAMMER  1 


CHIEF  PROGRAMMER  ?  ...  CHIEF  PROGRAMMER  M 


PROGRAMMERS 

TESTERS 


PROGRAMMERS 

TESTERS 


PROGRAMMERS 

TESTERS 


Figure  1.  Hierarchical  Management  Organization 


information  and  making  decisions  which  correlates  with  the  hierarchical  management 
structure.  Events  such  as  "decide  input  file  structure",  "gather  documentation  from 
the  submodules  of  module  Ml",  "begin  testing  module  M5"  provide  transformations  of  the 
programming  task,  replacing  the  as  yet  incomplete  project  with  the  next  stage  as  deter¬ 
mined  by  the  most  current  information.  The  management  hierarchy  generally  parallels 
the  modular  decomposition  of  the  programming  task.  This  can  be  seen  directly  in 
figure  2  where  we  illustrate  a  decomposition  of  a  multiple  pass  compiler. 

During  the  test  phase  of  the  project  the  mutation  system  records  a  wealth  of 
information  in  its  data  base  and  this  data  is  used  to  produce  reports  which  directly 
influence  decision-making  throughout  the  project  hierarchy.  The  type  of  information 
drawn  from  the  mutation  system  and  its  uses  vary  depending  on  the  project  hierarchy 
level  of  the  querrier.  In  this  section  we  sketch  some  possibilities  for  the  three 
levels  illustrated  in  figures  1  and  2.  Additional  possibilities  can  readily  be  imag¬ 
ined.  The  general  idea  is:  the  higher  the  querrier  is  in  the  project  structure,  the 
less  programming  oriented  is  the  gathered  information. 

Project  Manager's  Report 

The  project  manager  periodically  meets  with  the  chief  programmers  to  evaluate  the 
project's  testing  status.  Also,  the  assignment  of  personnel  and  the  evaluation  of 
personnel  performance  are  done  at  this  level.  The  project  manager's  report  would  con¬ 
tain  information  such  as : 

(1)  Ths  name  of  each  module. 


1978  ASQC  TECHNICAL  CONFERENCE  TRANS  ACTIONS-CHIC  AGO 


COMPILER  PROJECT 


SCANNER 

MODULE 


SUBMODULES 


PARSER 

MODULE 


SUBMODULES 


CODE 

GENERATOR 

MODULE 


SUBMODULES 


MACHINE 

DEPENDENT 

OPTIMIZATION 

MODULE 


SUBMODULES 


Figure  2.  Modular  Decomposition  of  a 
Multiple  Pass  Compiler 


(2)  The  chief  programmer  responsible  for  each  module. 

(3)  Plot#  of  the  mutant  elimination  percentage  vs.  time  for  each  submodule. 

<4)  For  each  submodule,  <a)  the  number  of  mutants,  (b)  the  number  and  the 

percentage  of  eliminated  mutants,  (c)  the  number  and  percentage  of  mutants 
deemed  equivalent,  and  (d)  the  number  and  percentage  of  non-el iminated 
mutants . 

(5)  For  each  module,  the  number  and  type  of  assigned  personnel. 

For  each  submodule,  the  number  and  type  of  assigned  personnel. 

This  information  can  be  used  by  the  project  manager  to  help  do  the  following: 

(1)  Monitor  adherence  to  the  project’s  testing  pert-chart. 

{2)  Decide  whether  an  acceptable  level  of  testing  has  been  obtained  for  a 
given  module  or  submodule. 

(3)  Re-assignment  of  personnel  to  work  on  modules  where  the  mutant  elimination 
percentage  is  low. 

(4)  Rewarding  personnel  who  achieve  high  mutant  elimination  percentages. 

(5)  Pinpointing  responsibility  for  modules  which  fail  after  having  been  judged 
acceptable . 

Chief  Programmer's  Report 

A  chief  programmer  should  be  familiar  with  the  program  code  of  all  the  submodules 
but  he  doesn’t  necessarily  do  any  of  the  programming  himself.  He  meet9  daily  with  his 
subordinate  personnel.  The  type  of  information  contained  in  a  chief  programmer ' s 
report  would  include: 

(1)  The  names  and  program  code  for  each  submodule  of  his  module. 

(2)  The  personnel  assigned  to  each  submodule. 

(3)  Plots  of  the  mutation  elimination  percentage  vs.  time  for  each  submodule. 

(4)  The  mutant  operators  being  applied  to  each  submodule. 

(5)  For  each  submodule,  (a)  the  number  of  mutants,  (b)  the  number  and  the 
percentage  of  eliminated  mutants,  (c)  the  number  and  percentage  of  mutants 
deemed  equivalent,  and  (d)  the  number  and  percentage  of  non-el iminated 
mutants. 

(6)  Listings,  in  coded  forms,  of  mutants  determined  equivalent. 

(7)  Personnel  responsible  for  classifying  mutants  as  equivalent. 

This  information  can  be  used  by  the  chief  programmer  to  do  the  following: 

(1)  Suggest  to  the  programmers  additional  mutant  operators  for  a  given 
submodule. 

(2)  Ask  a  programmer  to  justify  his  judgement  of  mutants  as  equivalent.  The 
chief  programmer  may  want  to  know,  for  instance,  why  it  does  not  matter 
if  a  certain  variable  can  be  mutated  without  changing  the  effect  of  the 
submodule.  That  is,  why  is  his  submodule  so  insensitive  to  that  mutation? 


1978  ASQC  TECHNICAL  CONFERENCE  TRANSACTIONS— CHICAGO 


(3)  Determine  that  a  given  submodule  has  been  acceptably  tasted  and 

prepare  evidence  on  this  decision  for  presentation  to  the  project 
manager . 

Programmer**  and  Tester' a  Report 

These  personnel  are  concerned  mainly  with  the  details  of  program  cod*  and  data 
and  thus  their  reports  will  be  the  moat  lengthy.  The  type  of  information  would 
Include: 

(1)  X  listing  of  the  submodule  code. 

(2)  The  current  teBt  data  for  the  submodule. 

(3)  The  mutant  operators  currently  being  applied  to  the  submodule. 

(4)  For  the  submodule,  (a)  the  number  of  mutants,  (b)  the  number  and  the 
percentage  of  eliminated  mutants,  (c)  the  nuntoer  and  percentage  of  mutants 
deemed  equivalent,  and  (d)  the  number  and  percentage  of  non-el iminated 
mutants . 

(5)  Profiles  of  the  information  in  (4)  with  respect  to  the  mutant  operators 
currently  being  applied. 

(6)  Listings,  in  coded  form,  of  the  non-el iminated  mutants. 

(7)  Listings,  in  coded  form,  of  the  mutants  determined  equivalent. 

This  information  could  be  used  by  programmers  and  testers  to  do  the  following: 

(1)  Xugment  the  current  test  data  so  as  to  eliminate  mutants  on  the  next 
mutation  run. 

(2)  Augment  the  set  of  applied  mutant  operators  for  the  next  mutation  run. 

<3)  Classify  non-el iminated  mutants  as  equivalent. 

(4)  *  Determine  that  the  submodule  has  been  adequately  tested  and  prepare 
evidence  of  this  for  presentation  to  the  chief  programmer. 

SOFTWARE  PROCURMENT  ASPECTS  OF  PROGRAM  MUTATION 

Government  agencies  and  profit  making  industries  are  currently  finding  that  pur¬ 
chasing  software  from  specialized  software  vendors  is  more  economical  than  in-house 
development.  The  contracts  generally  consist  of  the  specifications  for  the  software 
and  a  date  on  which  the  software  and  test  data  on  which  the  software  meets  the  speci¬ 
fications  are  to  be  delivered.  Occasionally,  some  test  data  is  given  with  the  speci¬ 
fications. 

Two  problems  for  the  contractor  are  apparent  in  the  above  scheme:  (1)  at  any 
time  during  the  contract  period  the  puchaser  has  no  indication  as  to  how  "close"  the 
software  it  to  being  ready,  and  (2)  upon  delivery,  although  the  software  works  correct¬ 
ly  on  the  supplied  test  data,  there  is  no  way  to  measure  the  quality  of  the  purchased 
software.  We  see  program  mutation  as  a  partial  solution  to  the  first  problem  and  as-  a 
definite  solution  to  the  second. 

Since  program  testing  is  the  final  stage  of  software  development,  a  contractor 
can  specify  that  the  vendor  indicate  at  what  point  testing  commences.  Assuming  that 
the  vendor  is  using  a  mutation  system,  the  contractor  can  monitor  the  final  stage  of 
development  by  having  the  vendor  periodically  report  mutant  elimination  percentages. 

To  evaluate  the  delivered  software,  ona  can  specify  in  contracts  that  the  test 
data  of  modules  must  eliminate  a  certain  percentage  of  the  mutant#  with  respect  to 
"standard"  mutant  operators.  Here  there  are  many  options.  Software  not  passing  this 
quality  teat  may  be  rejected  or  there  could  be  a  substantial  financial  penalty  to  the 
vendor.  In  this  case  it  is  not  essential  that  the  vendor  use  a  mutation  system,  only 
that  the  contractor  hava  one  available  to  evaluate  the  final  product.  Also,  note  that 
the  contractor  is  not  concerned  with  equivalent  mutants;  rather,  a  simple  test  (which 
can  be  entirely  computerized)  dependent  6olely  on  the  mutant  operators  is  used.  Cur¬ 
rently,  we  have  little  information  on  which  mutant  operators  should  be  employed  in  this 
test;  however,  experiments  to  answer  this  question  are  underway.  We  have  observed 
empirically  (13,15)  that  the  percentage  of  equivalent  mutants  tends  to  be  about  two. 

SUMMARY 

Program  mutation  is  an  important  new  tool  in  the  field  of  program  testing  which 
has  applications  in  other  fields.  Above  it  has  been  explained  how,  unlike  other  cur¬ 
rent  programming  methodologies,  a  program  mutation  system  can  provide  quantitative 
information  which  can  be  used  throughout  the  management  hierarchy  of  a  large  program¬ 
ming  project.  Furthermore,  program  mutation  has  an  important  application  in  that  it 
can  be  incorporated  into  contract*  for  software  procurement.  It  provides  purchasers  of 
software  with  a  means  of  measuring  the  quality  of  the  delivered  product. 


90 


1978  ASOC  TECHNICAL  CONFERENCE  TRANSACTIONS-CHICAGO 


ACKNOWLEDGEMENT 

V  We  acknowledge  the  work  of  Tim  Budd  and  the  other  members  of  the  Yale  University 

Testing  Group  for  help  in  implementing  and  experimenting  with  the  prototype  mutatici 

system  developed  at  Yale  University. 

REFERENCES 

1.  C.  V.  Ramamoorthy .  S.  F.  Ho,  and  w.  T.  Chen,  *\  i.  tin*  Automated  veneration  cl 
Program  Test  Data,*’  IEEE  Transactions  on  Sofiu\:t\  to.g:  neer~t:g  St-i.,4  (December 
1976),  pp.  293-300. 

2.  W.  £.  Howden,  “Methodology  for  the  Generation  of  Program  Test  Data,"  JEFF 
Transactions  on  Computers  C-24,5  (May  1976),  pp.  554-560. 

3.  W.  t.  Howden,  “Reliability  of  the  Path  Analysis  Testing  Strategy,"  IEFF 
Transactions  on  Software  Engineering  SE-2,3  (September  1976),  pp.  2U8-214. 

4.  J.  B.  Goodenough  and  S.  L.  Gerhart,  "Towards  a  Theory  of  Test  Data  Selection,' 

IEEE  Transactions  on  Software  Eng : nee  ring  SE-1,2  (June  1975),  pp.  156-173. 

5.  J.  C.  Huang,  “An  Approach  to  Program  Testing,"  Computing  Surveys  7,3  (September 
1975),  pp.  113-128. 

6.  E.  F.  Miller  and  R,  A.  Melton,  "Automated  Generation  of  Teatcase  Datasets,"  in 
Proceedings  of  the  First  International  Conference  on  Reliable  Software,  SIC FLA 
Notices  10,6  CJune  1975),  pp.  51-58. 

7.  L.  Clarke,  "A  System  to  Generate  Test  Data  and  Symbolically  Execute  Programs," 

IEEE  Transactions  on  Software  Engineering  SE-2,3  (September  1976),  pp.  215-222. 

8.  J.  King,  “Symbolic  Execution  and  Program  Testing,"  Communications  of  the  ATM 
19,7  (July  1976),  pp.  385-394. 

9.  R.  London,  "The  Current  State  of  Proving  Programs  Correct,"  in  Proceedings  of 
the  ACM  National  Conference,  1972,  ACM,  New  York,  pp.  39-46. 

10.  S.  Hantler  and  J.  King,  "An  Introduction  to  Proving  the  Correctness  of  Programs,  ’ 
Computing  Surveys  8,3  (September  1976),  pp.  331-353. 

11.  E.  A.  Youngs,  "Human  Errors  in  Programming,"  ;>.  * .  n Journal  of  Man 

Machine  Studies  6  (1974),  pp.  361-37o. 

12.  B.  Boehm,  “Software  Design  and  structuring,"  m  Practici1  Strategies  for 
Developing  Large  Software  Systems,  Horowitz  (Editor),  Addison-Wesley ,  1975, 
pp.  103-128. 

13.  R.  DeMillo,  R.  Lipton,  and  f.  Sayward,  "Hints  on  Test  Data  Selection,"  to 
appear  in  Computer,  April  1978. 

14.  Special  Issue:  Programming,  ACM  Computing  Surveys  6,4  (December  1974), 
pp.  209-319. 

{  15.  R.  DeMillo,  R.  Lipton,  and  F.  Sayward,  "PROGRAM  MUTATION:  A  Method  of  Determining 

Teat  Data  Adequacy,"  submitted  to  the  Third  Int.  Conf .  on  Soft.  Eng.  (1978). 


LCS  640:70:000 


AD-A107  774 

unclassified 


GE0«6IA  INST  OF  TECH  ATLANTA  SCHOOL  OF  INFORMATION  A— ETC  F/0  4/2 
PAPERS  ON  PROGRAM  TESTING* IU> 

1979  R  A  DCMILLO*  R  J  LIPTON*  F  O  SAYNARD  N00014-79-C-0231 
0IT-ICS-79/O4  HL 


91 


STABILITY  OF  TEST  DATA  FROM  PROGRAM  MUTATION 


James  E.  Burns 

School  of  Information  and  Computer  Science 


GEORGIA  INSTITUTE  OF  TECHNOLOGY 
Atlanta,  GA  30332 


1 .  INTRODUCTION 

Program  testing  is  an  expensive  part  of  program  development.  A 
significant  portion  of  this  cost  may  go  into  the  creation  of  high  duality 
test  data.  In  an  active  environment,  it  is  rare  for  a  program  to  c  . 
unmodified  over  a  long  period.  Considerable  effort  can  be  saved  if  test 
data  created  for  earlier  program  versions  can  be  shown  to  be  satisfactory 
for  testing  new  versions. 

The  second  section  of  this  paper  briefly  introduces  a  promising 
new  tool  for  program  testing,  program  mutation.  Program  mutation  has 
the  attractive  qualities  that  it  aids  in  finding  good  test  data  sets 
and  also  provides  a  quantitative  measure  of  how  good  they  are.  Section 
3  describes  the  experiment  performed  to  test  the  hypothesis  that  test 
data  produced  by  program  mutation  tends  to  be  stable.  The  final  two 
sections  give  the  results  of  the  experiment  and  draw  conclusions. 

2.  PROGRAM  MUTATION 

Program  mutation  is  a  recently  developed  technique  for  creating 
high  quality  test  data.  A  brief  description  of  the  technique  is  given 
here,  but  the  reader  is  referred  to  references  [1-4]  for  a  more  complete 
explanation,  .especially  regarding  motivation. 

This  work  was  supported  in  part  by  U.S.  Army 
Research  Office  Grant  iil0AAG29-78-G-0121 . 


92 


The  central  idea  of  program  mutation  is  the  construction  of  a 
set  of  “mutants"  of  the  target  program.  A  mutant  is  a  copy  of  the 
targe*  program  which  differs  only  by  a  single  “mutation".  A  mutation 
is  a  transformation  of  a  program  statement  in  a  way  which  simulates 
typical  program  errors.  For  example,  one  mutation  is  to  modify  the 
value  of  a  literal  constant  -  the  FORTRAN  statement  "I  -  1+3"  might 
be  changed  to  "Is  1+2".  Some  mutants  may  turn  out  to  be  equivalent, 
functionally,  to  the  target  program.  The  remainder  should  be  distin¬ 
guished  from  the  target  program  by  sufficiently  powerful  test  data. 
Test  data  which  is  able  to  distinguish  all  non-equivalent  mutants  of 
a  target  program  must  thoroughly  exercise  the  program  and,  hence, 
provide  strong  evidence  of  the  program's  correctness. 

Let  P  be  a  program  and  M(P)  be  the  set  of  mutants  of  P. 

(Note:  M(P)  depends  on  the  language  of  P  and  the  set  of  mutations 
chosen.  We  assume  a  fixed  language  and  a  fixed  set  of  mutations 
for  purposes  of  discussion.)  Let  Q( P)  be  the  subset  of  M(P)  that 
are  functionally  equivalent  to  P.  For  a  given  set  of  test  data,  T, 
an  element  mcM(P)  is  said  to  be  eJUmiruvted  by  T  if  and  only  if 
there  is  at  least  one  element  of  T  which  distinguishes  m  from  P, 
otherwise,  m  is  said  to  be  Live..  Now  we  may  define  the  adequacy  ojj 
T  £ o\  P  by 

M(P)  -  L(T,P)  -  Q(P) 

A(T,P)  =  100%  x - 

M(P)  -  Q (P) 

where 

L(T,P)  *  {  mcH(P)  |  m  is  live  for  P  under  T  } 

When  A(T,P)  =  100%,  we  say  that  T  is  adequate  for  P.  If  no  element  of 
T  can  be  removed  without  reducing  the  adequacy  of  T,  then  T  is  said  to 
be  reduced. 


325 


93 


Adequacy  provides  a  quantitative  measure  of  the  thoroughness 
with  which  a  set  of  test  data  exercises  the  target  program.  Unfor¬ 
tunately,  this  measure  may  be  difficult  to  compute  since  Q(P)  is 
usually  difficult  to  determine.  However,  the  following  approximation 
to  A(T,P)  is  usually  sufficiently  accurate  since,  (empiriclally),  Q(P) 
is  rarely  greater  than  5X  of  M(P): 

M(P)  -  L(T,P) 

A'(T.P)  *  1001  x  - 

M(P) 

Note  that  A'(T,P)  always  approximates  A(T,P)  from  below.  Also,  if 
part  of  Q(P)  can  be  easily  determined,  the  approximation  can  be 
improved. 

3.  STABILITY  OF  TEST  DATA 

Intuitively,  the  stability  of  a  set  of  test  data  refers  to  how 
powerful  it  is  in  testing  programs  which  are  slightly  modified  versions 
of  the  program  far  which  the  test  data  was  developed.  The  adequacy 
measure  gives  a  means  of  quantifying  this  concept  with  the  following 
definition. 

Let  P  be  a  set  of  closely  related  programs,  (P1  ,P2,. . . ,Pn> . 
Assume  that  all  the  programs  in  ?  will  correctly  accept  the  same 
set  of  test  data,  T.  Then  the  AtabiZUy  T  xe&rfive.  to  ?  is  given 
by 

S(T,P)  »  min  AfT.P^ 

1  1  1  1  h 

We  also  define  an  approximation  to  this  measure  by 

S' (T,P)  »  min  A'(T,P.) 

1  <  i  <  n 


32f 


The  stability  of  T  is,  of  course,  highly  dependent  on  P.  We 
wish  to  determine  whether  or  not  test  data  produced  by  program  mutation 
is  relatively  stable  for  P  chosen  to  be  sufficiently  ''similar'1  to 
the  program  for  which  the  test  data  was  created.  For  this  experiment, 
ten  sorting  algorithms  were  chosen,  (see  Appendix  A).  Since  these 
programs  are  functionally  identical,  they  are  certainly  highly  simila-. 
There  is  some  motivation  for  using  functionally  identical  programs  in 
this  study  since  one  type  of  program  modification  is  the  replacement  of 
an  algorithm  with  a  functionally  identical  but  more  efficient  one. 

If  test  data  produced  by  the  program  mutation  method  is  stable,  high 
values  of  the  stability  measure  would  be  expected.  More  complex 
algorithms  would  be  expected  to  have  higher  values  of  the  measure 
since  they  would  tend  to  require  stronger  test  data. 

Each  of  the  algorithms  listed  in  Appendix  A  was  coded  into  the 
subset  of  FORTRAN  accepted  by  the  Pilot  Mutation  System  (PIMS)  developed 
at  Yale  University.  PIMS  automatically  generates  the  set  of  mutants  which 
are  to  be  eliminated  by  test  data.  Test  cases  may  be  entered  inter¬ 
actively  and  tested  by  the  PIMS  interpreter  against  the  mutants.  The 
living  mutants  may  be  examined  through  the  system  to  aid  in  selecting 
additional  test  data. 

Test  data  was  developed  for  each  program  in  the  set  independently. 
An  attempt  was  made  in  each  case  to  eliminate  all  mutants  which  could  not 
be  identified  as  being  functionally  equivalent  to  the  original  program. 

(In  all  but  three  cases,  all  non-equivalent  mutants  were  eliminated.) 

The  resulting  test  set  were  then  reduced  to  remove  any  inessential  test 
cases.  Finally,  the  test  set  for  each  program  was  run  against  each  of 
the  other  programs  to  determine  the  number  of  mutants  eliminated  in  each 


95 


4.  RESULTS 

The  raw  results  of  the  experiment  described  above  are  presented 
in  Table  1.  The  programs  are  ordered  by  the  number  of  mutants  produced 
by  PIMS,  which  is  a  rough  measure  of  t..eir  complexity.  Table  2  gives 
the  number  of  living  mutants  left  by  each  test  set  with  all  mutants 
which  could  be  determined  to  be  equivalent  removed.  The  adequacy 
measures,  A‘(T,P)  and  A(T,P),  and  stability  measures,  S'(T,P)  and 
S(T,P),are  given  in  Tables  3  and  4. 

Two  sets  of  test  data,  (I  and  J,  produced  by  Quicksort  and  the 
Natural  Two-way  Merge  Sort),  provided  very  strong  test  data  with  stability 
measures  of  over  98%.  In  fact,  the  test  data  from  Quicksort  was  able 
to  eliminate  more  mutants  of  the  Merge  Exchange  (H)  and  the  Natural 
Two-way  Merge  Sort  (J)  than  the  data  created  using  the  PIMS  system 
directly.  This  may  result  in  part  from  the  large  number  of  test  cases 
(14)  required  by  Quicksort. 

The  remaining  test  sets  did  not  produce  impressive  stability 
measures  over  this  set  of  programs,  although  Heapsort  (F)  did  achieve 
a  respectable  88. 6S.  However,  if  the  two  most  complex  programs  (I  &  J) 
are  removed  from  the  set  used  to  compute  the  measure,  all  of  the 
test  sets  have  stability  measures  near  90X. 

5.  CONCLUSIONS 

If  all  of  the  test  sets  produced  in  this  experiment  had  proven 
to  have  had  very  high  stability  measures,  we  could  conclude  that  there 
was  evidence  that  test  data  from  program  mutation  was  stable.  This 


J28 


96 


TABLE  1 

Number  of  Live  Mutants 


Program  /  #  of  Mutants 


Test  Set  {#  cases) 

A 

187 

B 

240 

C 

266 

D 

274 

E 

282 

F 

830 

G 

1094 

H 

1233 

I 

1838 

J 

2292 

A 

(4) 

8 

11 

9 

12 

15 

128 

78 

168 

567 

1104 

B 

(6) 

8 

8 

6 

10 

15 

127 

71 

159 

547 

1087 

C 

(5) 

8 

10 

6 

9 

10 

98 

71 

154 

421 

1094 

D 

(5) 

8 

11 

7 

9 

8 

52 

72 

142 

548 

555 

E 

(4) 

10 

15 

12 

13 

8 

56 

81 

166 

788 

563 

F 

(8) 

8 

9 

6 

9 

8 

23 

69 

75 

247 

300 

G 

(6) 

8 

9 

7 

10 

8 

79 

50 

152 

409 

1084 

H 

(5) 

8 

8 

6 

9 

8 

25 

68 

42 

246 

517 

I 

(14) 

9 

8 

6 

9 

8 

24 

50 

32 

132 

54 

J 

(6) 

10 

12 

7 

10 

8 

25 

55 

43 

193 

74 

TABLE  2 

Number  of  (Live  -  Equivalent)  Mutants 


Program 


Test  Set 

A 

B 

C 

D 

E 

F 

G 

H 

I 

J 

A 

0 

3 

3 

3 

7 

105 

28 

138 

439 

1061 

B 

0 

0 

0 

1 

7 

104 

21 

129 

419 

1044 

C 

0 

2 

0 

0 

2 

75 

21 

124 

293 

1054 

D 

0 

3 

1 

0 

0 

29 

22 

112 

420 

512 

E 

2 

7 

6 

4 

0 

33 

31 

136 

660 

520 

F 

0 

1 

0 

0 

0 

0 

14 

45 

119 

257 

G 

C 

1 

1 

1 

0 

56 

0 

122 

281 

1041 

H 

0 

0 

0 

0 

0 

2 

18 

12 

118 

474 

I 

1 

0 

0 

0 

0 

1 

0 

2 

4 

11 

J 

2 

4 

1 

1 

0 

2 

5 

13 

65 

31 

329 


TABLE  3 

Approximation  of  Adequacy  : 

*  A' (T,P) 

Test  Set .  1 

Program 

A  B  C  0  E  F 

G  H 

95.7  95.4  96.6  95.6  94.7  84.6  92.9  86.4  69.2  51.8 

95.7  99.2  97.7  96.4  94.7  84.7  93.5  87.1  70.2  52.6 

95.7  95.8  97.7  96.7  96.5  88.2  93.5  87.5  77.1  52.3 

95.7  95.4  97.4  96.7  97.2  93.7  93.4  88.5  70.2  75.8 

94.7  93.3  95.5  95.3  97.2  93.3  92.6  86.5  57.1  75.4 

95.7  96.3  97.7  96.7  97.2  97.2  93.7  93.9  86.6  86.9 

95.7  96.3  97.4  96.4  97.2  90.1  95.4  87.7  77.7  52.7 

95.7  99.2  97.7  96.7  97.2  97.0  93.8  96.6  86.6  77.4 

94.7  99.2  97.7  96.7  97.2  97.1  95.4  97.4  92.8  97.6 

94.7  95.0  97.4  96.4  97.2  97.0  95.0  96.5  89.5  96.8 


TABLE  4 


Test  Set 


Adequacy  :  «  A(T,P) 

Program 
C  0  E  F 


98 


would  imply  that  it  would  not  be  necessary  to  re-analyze  a  program 
every  time  a  small  change  was  made  to  it.  The  negative  result  implies 
instead  that  the  test  data  may  not  be  stable,  especially  if  the  change 
made  to  a  program  increases  it  complexity.  Thus,  it  is  prudent  to 
perform  mutation  analysis  on  revisions  to  programs;  however,  it  is 
likely  that  previously  derived  test  data  will  provide  a  good  starting 
point  for  mutant  elimination.  In  many  cases  it  will  be  unnecessary 
to  generate  any  new  test  cases  at  all. 


331 


99 


References 


[1]  Budd,  T.  A.  and  R.  J.  tipton,  "Mutation  Analysis  of  Decision 
Table  Programs,"  Proc.  of  the  1978  Johns  Hopkins  Conf.  on 
Information  Systemian?  Sciences ,  p .  34$-3?9t 


[2]  Budd,  T.  A.,  R.  J.  Lipton,  F.  G.  Sayward  and  R.  A.  DeMIllo, 

"The  Design  of  a  Prototype  Mutation  System  for  Program  Testing," 
Proc..  1978  NCC,  p.  623-628. 


[3]  DeMIllo,  R.  A.,  R.  J.  Upton  and  F.  G.  Sayward,  "Hints  on  Test 
Data  Selection:  Help  for  the  Practicing  Programmer,"  Computer 
11(4),  (April  1978),  p.  34-41. 


[4]  DeMIllo,  R.  A.,  R.  J.  Lipton  and  F.  G.  Sayward,  "PROGRAM  MUTATION: 
A  New  Approach  to  Program  Testl ng , "  I nf otech/SRA  State  of  the 
Art  Report  :  Program  Testing  ( to  appear  1978.) 


100 


Volume  7,  number  4 


INFORMATION  PROCESSING  LETTERS 


June  1978 


A  PROBABILISTIC  REMARK  ON  ALGEBRAIC  PROGRAM  TESTING 
Richard  A.  DEM1LLO 

School  of  Information  and  Computer  Science.  Georgia  Institute  of  Technology,  Atlanta.  GA  30332.  USA 

Richard  J.  LIPTON 

Computer  Science  Department.  Yale  University,  New  Haven,  CT  06520,  USA 

* 

Received  8  August  1977;  revised  version  received  27  March  1978 


Software  reliability,  program  testing 


Until  very  recently,  research  in  software  reliability 
has  divided  quite  neatly  into  two  -  usually  warring  - 
camps:  methodologies  with  a  mathematical  basis  and 
methodologies  without  such  a  basis.  In  the  former 
view,  '‘reliability”  is  identified  with  "correctness”  and 
the  principle  tool  has  been  formal  and  informal  veri¬ 
fication  [  1  ] .  In  the  latter  view,  “reliability”  is  taken 
to  mean  the  ability  to  meet  overall  functional  goals 
to  within  some  predefined  limits  [2,3].  We  have 
argued  in  [4]  that  the  latter  view  holds  a  great  deal 
of  promise  for  further  development  at  both  the  prac¬ 
tical  and  analytical  levels.  Howden  [5]  proposes  a 
first  step  in  this  direction  by  describing  a  method  for 
“testing”  a  certain  restricted  class  of  programs  whose 
behavior  can  -  in  a  sense  Howden  makes  precise  - 
be  algebraicized.  In  this  way,  "testing”  a  program  is 
reduced  to  an  equivalence  test,  the  major  components 
of  which  become 

(i)  a  combinatorial  identification  of  “equivalent” 
structures; 

(ii)  an  algebraic  test 

ft  =fl  , 

where  ft,i  =  1, 2  is  a  multivariable  polynomial 
(multinomial)  of  degree  specified  by  the  pro¬ 
gram  being  considered. 

In  arriving  at  a  method  for  exact  solution  of  (ii), 
Howden  derives  an  algorithm,  that  requires  evaluation 

of  multinomials  fix  . xm )  of  maximal  degree  d  at 

0((d  +  l)m]  points.  For  large  values  of  m  (a  typical 


case  for  realistic  examples),  this  method  becomes 
prohibitively  expensive. 

Since,  however,  a  test  for  reliability  rather  than  a 
certification  of  correctness  is  desired,  a  natural  ques¬ 
tion  is  whether  or  not  Howden's  method  can  be  im¬ 
proved  by  settling  for  less  than  an  exact  solution  to 
(ii). 

We  are  inspired  by  Rabin  [6]  and.  less  directly, 
by  the  many  successes  of  Erdos  and  Spencer  1 7 ]  to 
attempt  a  probabilistic  solution  to  (ii).  Using  these 
methods,  we  show  that  (ii)  can  be  tested  with  proba¬ 
bility  of  error  e  *  with  only  0(gfe))  evaluations  of 
multinomials,  where  g  is  a  slowly  growing  function  of 
only  e.  In  particular,  30  or  so  evaluations  should  give 
sufficiently  small  probability  of  error  for  most  prac¬ 
tical  situations.  The  remainder  of  this  note  is  devoted 
to  proving  this  result. 

Let  us  denote  by  P*  o  (m,  d)  the  class  of  multi¬ 
nomials 

fix  . . xm)$  0 

(over  some  arbitrary  but  fixed  integral  domain)  whose 
degree  does  not  exceed  d  >  0.  We  define 

P(m,  d,  r)  =  min  Probfl  <jr,  <  r,  f( xt . O' 

feP*0(m,d). 


•  See  Rabin's  account  of  algorithms  that  may  err  with  fixed 
probability  (6). 


193 


101 


Volume  7,  number  4  inkormatIon  processing  letters  June  i  97h 

We  think  of  P(m,  d,  r )  as  the  minimal  relative  fre¬ 
quency  with  which  witnesses  to  the  non-nullity  of  a 
multinomial  of  the  appropriate  kind  can  occur  in  the 
chosen  interval.  We  will  derive  a  lower  bound  p  for 
P(m,  d,  r).  Then  (1  -  p)  is  an  upper  bound  on  the 
error  in  selecting  a  random  point  from  the  m-cube. 

We  then  iterate  the  procedure  by  /  independent  ran¬ 
dom  selections  to  obtain  a  small  probability  of  error 
(1  -  p)' .  Notice,  in  particular,  that  since  a  polynomial 
of  degree  d  has  at  most  d  roots  (ignoring  multiplicity), 
the  largest  probability  of  finding  a  root  must  be  at 
least  the  probability  of  finding  a  root  by  randomly 
sampling  in  the  interval  1  <  x ,  <  r,  thus 

P(\,d,  r)>  1  -  d/r . 

Now,  consider  some 

fix  i . xm. .»’)  $  0 

of  degree  at  most  d.  But  there  are  then  multinomials 
not  *11  £  0.  such  that 
d 

/'( Jr, . xm,  y)  =  £  . xm)y‘  . 

i=  0 

Let  us  suppose  that  gk  e  P*0(m,  d).  Thus 
Prob  { 1  <x j<r,f(xu  ...,xm,  y)*  0} 

>  Probf  gk(x |, ....  jrm)  *  0  and  y  is  not  a  root} 

>P(m,  d.  r)(l  -  d/r). 

Tabic  1 

Probable  error  in  testing  f(x  j.  ...  jrm)  s  0  (degree  <  d)  by  t  random  evaluations  in  {1 . r\ 


1 1  -  P(m.J.r)\’ 


dm 

r 

r  +  10 

r  =  20 

r  *  30 

r  =  50 

r  =  100 

10 

10 

10  X  I0“3 

106  X  10-6 

I  x  10-6 

109  x  10- 12 

12  x  10~21 

20 

10 

233  X  10~3 

54  X  10“3 

13  x  10-3 

695  x  I0-6 

483  x  I0~9 

50 

to 

935  x  10“3 

873  X  10“3 

816  x  10-3 

713  X  10“3 

509  x  10“3 

to1 

to 

'  1 

'1 

'1 

'1 

'1 

10 

102 

61  X  10“12 

<10_2° 

<io-20 

<10~20 

'0 

20 

to2 

38  X  I0~9 

1  X  10_,s 

<io-20 

<10“20 

'0 

50 

to2 

88  X  10-6 

8  x  10“9 

704  X  10~15 

<10-20 

<10~20 

103 

102 

'  1 

'1 

'1 

'1 

'1 

to 

103 

<10-20 

<io-JO 

CIO-20 

'0 

'0 

20 

103 

9  X  I0~18 

CIO"20 

<io-JO 

'0 

'0 

50 

103 

76  X  10“' 5 

<10_2° 

<10-20 

'0 

'0 

194 


Continuing  inductively,  we  obtain  a  lower  bound 
in  P(m,  d.  r)  as  follows: 


P(m.d.r)>(  1  -d/r)m 
But 

lim  (1  -  d/r)m  =  lim 


(1) 


1  + 


1  (- 


(=?)] 


Cl 


=  ezp(~dm/r) 

Combining  ( 1 )  and  (2),  we  have  for  large  m,  r  ~  dm. 

P(m,  d,  dm)  >  e_1  . 

Thus,  with  t  evaluations  of /  for  independent  choices 
of  points  from  the  m-cube  with  sides  r  =  dm.  ihe 
probability  of  missing  a  witness  to  the  non-nullity  oi 
f\x\, ....  xm)  is  at  most 

(1  -  e-*y  . 

Table  1  shows  the  probable  error  in  testing /=  0 
by  t  evaluations  of  f  at  randomly  chosen  points  for 
some  typical  values  of  d,  m,  r,  t.  Notice  that  for 
dm  -  r,  t  =  30,  this  is  already  <10-5. 


References 

[  1 1  Z.  Manna,  Mathematical  Theory  of  Computation 
(McGraw-Hill,  New  York,  1974). 


am  ■ 


102 


volume  /,  number  4 


im  uKMA  I  ION  PROCKSSING  LH'II 


121  J.  H  tt,own ■  M  L‘P°W'  Testing  lor  software  reliability 
nlern  Com  m  Reliable  Software.  SIGPLAN  Notices 
10.  b.  (June  1975)  518  527 

|3|  A.  I  Llewelyn.  R.T.  Wilkins.  The  testmg  of  computer 

l  a  |  SO,,W-,rC  <S9  199 

M)  R.A  OeM.llo,  R.J.  Lipton.  A.J.  Perils,  Social  processes 
and  proofs  of  theorems  and  programs,  Pourth  ACM 
Symposium  in  Principles  of  Programming  Languages 
(to  appear  in  CACM). 


!S|  W-t.  Howden.  Algebraic  program  testing.  Comp,,,,, 
Science  Technical  Report  No.  14  (November  I97r,, 
Ut  -San  Diego.  La  Jolla.  CA 

(6/  M.O.  Rabin.  Probabilistic  algoritl.nis,  m  j  j 
Algorithms  and  Complexity  (Academic  Press  vr« 

Vork,  1976)  21  4() 

1 7)  P  Erdos,  J.  Spence,.  Probabilistic  Methods  m  ( 
natones  (Academic  Press,  New  fork.  J974( 


195 


103 


MUTATION  ANM  Y.  ::  OF  DECISION  TABLE  PROGRAMS 

Timothy  a.  BudU 
Kii-haid  J.  Lipton 

Department  of  Computer  Science 
Yale  University 
New  Haven,  Connecticut  06520 


I .  INTRODUCTION 

For  years  computer  programmers  have  been 
testing  programs  on  small  sets  of  test  data  in 
order  to  infer  correctness,  on  the  assumption 
that  if  the  program  works  correctly  on  a  certain 
set  of  "hard”  test  data,  it  will  probably  work 
correctly  on  any  data.  Of  course,  most  program¬ 
mers  have  little  more  than  an  intuitive  idea  of 
what  represents  "hard"  data.  Expressed  in  such 
vague  terms,  such  faith  is  obviously  not  well 
founded. 

Recently  interest  has  increased  in  formali¬ 
zing  the  theoretical  aspects  of  program  testing 
(4,7].  As  in  this  earlier  work/  we  can  give  a 
formal  interpretation  to  the  ideas  of  program 
testing  as  follows:  We  can  view  a  program  R 
as  being  a  function  from  an  input  domain  to  an 
output  domain.  For  every  program  R  we  can 
assume  there  exists  a  function  F  which  R  was 
intended  to  compute.  The  correctness  question 
can  then  be  phrased  as  "is  R  a  realization  of 
the  function  F  7" 

Previous  work  has  been  directed  toward 
finding  a  predicate  P  over  the  space  of  pro¬ 
grams  and  input  domains  such  that,  for  a  given 
function  F  and  program  R  ,  if  a  set  of  test 
cases  T  satisfies  P(T,R)  and  R  correctly 
computes  F  on  T  ,  then  we  can  infer  that  R 
is  a  realization  of  F  .  The  results  of  14 1 
show  that  such  a  predicate  must  always  exist. 
However,  that  does  not  imply  that  testing  is 
easy:  it  need  not  be  the  case  that  T  is  fi¬ 
nite  or  that  P  be  decidable. 

To  give  a  more  concrete  example:  For  any 
logical  expression  we  can  construct  a  program 
which  computes  the  value  of  that  expression 
over  a  set  of  boolean  inputs.  Suppose  we  have 
such  a  program  and  wish  to  assert  that  it  is  the 
constant  function  FALSE.  Any  predicate  which 
will  satisfy  the  above  will  imply  that  either 
1)  T  must  in  some  cases  be  exponential  in  the 
size  of  the  program,  or  2)  P  must  solve  an  NP- 
hard  problem  (1 ] . 

Given  then  the  difficulty  of  the  task  of 
finding  test  data  that  will  without  doubt  show 
that  a  program  is  correct,  the  goals  of  mutation 
analysis  are  much  simpler.  in  mutation  analysis 
we  define  a  predicate  P  as  before,  and  ;n  ad¬ 
dition  we  have  some  measure  M  of  the  "syntac¬ 
tic  distance"  between  programs.  The  theorem  we 
then  hope  to  prove  is:  Given  a  function  F  and 
program  R  then  if  a  set  of  test  cases  T  sat¬ 
isfies  P(T,R)  and  R  correctly  computes  F  on 
T  then  either  1)  R  is  a  correct  realization 
of  F  ,  or  R  is  (in  terms  of  the  measure) 

very  far  away  from  ALL  programs  that  correct  ly¬ 
re  la  lze  r 


Informally,  the  latter  condition  can  be 
stated  by  saying  if  R  is  incorrect,  it  is  in¬ 
correct  in  a  very  radical  fashion.  Of  course, 
the  truth  and/or  utility  of  such  theorems  de¬ 
pends  very  strongly  on  the  predicates  and 
measures  chosen.  Previous  papers  on  mutation 
analysis  have  demonstrated  a  large  body  of  em¬ 
pirical  evidence  showing  that  for  a  very  real is 
tic  problem  domain  (FORTRAN  programs)  there  *  s  a 
relatively  easy  to  verify  predicate  and  a  natural 
measure  for  which  that  theorem  seems  to  be  true 
[21.  The  present  paper  presents  for  the  first: 
time  analytical  results  for  a  simpler  problem 
domain . 

This  paper  proves  analytically  a  theorem 
similar  to  the  one  described  above,  but  for  the 
problem  domain  of  decision  table  programs.  In 
section  II  we  formally  define  decision  table 
programs .  In  section  III  we  define  a  measure  or. 
the  space  of  decision  table  programs,  and  intro¬ 
duce  the  concept  of  mutation.  Section  IV  con¬ 
tains  the  main  results  of  the  paper,  leading  up 
to  a  formal  theorem  similar  to  the  one  given 
above.  In  section  V  we  comment  on  the  complex¬ 
ity  of  mutant  analysis,  as  opposed  to  explicit 
enumeration,  and  conclude  the  paper  with  some 
open  problems  and  directions  for  future  research. 

II.  DECISION  TABLE  PROGRAMS 

Decision  tables  are  a  method  of  organizing 
rules  that  specify  the  conditions  under  which 
certain  actions  are  to  be  performed.  Decision 
tables  are  chiefly  used  in  business  and  data  pr\  - 
cessing  applications  (5,6],  although  in  [4]  they 
are  used  as  a  means  of  organizing  test  data  se¬ 
lection  predicates. 

We  can  abstract  the  notion  of  a  Decision 
Table  Program  as  follows  [5,6].  We  have  first  a 
set  of  n  Conditions  and  a  set  of  p  Actions . 
The  conditions  are  a  set  of  predicates  m  some 
language,  say  English  or  FORTRAN.  The  actions 
are  given  in  the  same  language,  and  are  assumed 
to  be  independent ,•  that  is,  the  results  of  exe¬ 
cuting  any  subset  of  the  actions  are  independcr.' 
of  the  order  in  which  they  are  executed.  (Alter¬ 
natively,  we  could  merely  define  an  ordering  on 
the  actions.)  The  actions  are  also  assumed  t. 
be  detectable;  that  is,  given  input  and  output 
data,  it  is  possible  to  tell  precisely  what,  ac¬ 
tions  were  executed  on  the  input  to  produce  the 
output . 

The  decision  table  itself  is  then  composed 
of  two  matrices:  an  n  by  m  condition  matrix 
and  a  p  by  m  action  matrix.  We  say  the  pr^ 
gram  contains  m  RULES  where  each  rule  corres¬ 
ponds  to  a  cross  section  along  columns  of  the 
condition  and  action  matrices. 


146 


104 


Elements  of  the  ('ui'.diLuu  matt  lx  v\mliUii 
!ii*  t  three  valuer:  Y,  N  ox  *  (read  YES,  NO 
and  DON  *  T  CARE).  Elements,  tt  the  action  matrix 
v  .tin  one  of  two  values:  X  or  blank. 

To  execute  the  program  *«i.  a  .selected  data 
•V  nt  we  proceed  as  toliows:  first  we  evaluate 
ea  "i  of  the  conditions  on  the  data,  forming  a 
ve  i or  of  size  n  containing  YES/NU  values. 

We  then  consider  each  of  the  M  rules  (in  some 
,.m  f  f*cif  led  order).  If  any  rule  is  SATISFIED 

i..  *  iu  sense  that  for  each  position  in  the 
mi  ‘  that  contains  a  Y  the  data  satisfies  the 
j:  i  cated  condition,  and  fail:,  to  satisfy  the 

*  uditiuns  associated  with  a  N,  then  the  act  ions 
{.trf  of  the  rule  is  executed. 

It  for  each  possible  data  *  tor.  there  is  at 
i"t  one  rule  that  can  be  satisfied  we  say  the 
decision  table  is  COMPLETE.  We  say  it  is  CON¬ 
SISTENT  if  there  is  at  most  one  rule.  There 
are  mechanical  methods  to  determine  whether  an 
arbitrary  decision  table  program  is  complete 
a: J  or  consistent  15] . 

We  can  assume  that  no  two  rules  specify 
exactly  the  same  set  of  actions.  We  can  do  this 
wit  lout  loss  of  generality  since  two  rules  that 
specify  the  same  actions  can  be  combined  with  at 
w  :  ;i  the  addition  of  one  new  condition  row. 

.II.  ERRORS  AND  MUTATIONS 

We  will  say  a  program  is  correct  if  it  cor- 
rt  v -iy  realizes  the  function  it  was  intended  to. 
A.  i  rogram  is  incorrect  if  it  is  not  correct; 

is,  there  is  at  least  one  point  at  which 
•J.*  : To-gram  and  the  function  compute  differing 
:* s-ltb. 

Liven  a  decision  table  program  P  ,  let  s 
hi-  *.:»e  set  of  all  decision  table  programs  having 
t  h*.  same  conditions,  actions  and  number  of  rules 
a-  P  .  The  definition  implies  that  each  pro¬ 
gram  in  S  can  differ  from  P  only  in  the 
.  :iih:.  it  contains  in  its  matrices. 

W*>  will  say  a  program  J  is  radically 
rrect  if  not  only  does  it  not  correctly  co:n- 
;  it**  the  function  it  was  intended  to,  but  no 
pr*^ram  m  S  computes  this  function  either. 

A  radically  incorrect  program  cannot  just  have 
,i  few  table  entries  wrong,  but  must  be  wrong  n 
i  *  :>  -unditions,  actions,  or  in  the  number  of 
r  l«*s  it  contains. 

Define  a  suLset  M  of  S  (in  [2]  these 
ace  .ailed  the  mutants  of  P  )  to  be  the  set  of 
jrouxams  formed  by  making  changes  to  a  single 

*  '.try  in  the  tables  representing  P  .  We  car. 
c.issify  those  changes  into  four  types  as  fol- 

!  -w  s : 

l  x  '■  E  1  CHANGE:  A  Y  or  N  entry  is  changed  to  a  *  . 
i :  V  2  CHANGE:  A  Y  is  changed  to  a  N  or  vice 
versa . 

I’YPc  J  CHANGE :  A  *  is  changed  to  a  Y  or  N  . 

T i  F:.  4  CHANGE:  An  X  is  changed  to  a  blank  oi 
vice  versa. 

Notice  that  even  if  P  is  complete  and,- or 
■  ;>  n  » :  s  tent ,  members  of  M  need  not  share  this 
:  -  :»erty . 

It  may  happen  that  certain  members  of  M 
* . . i  bo  equivalent  to  P  ,  that  is,  will  evalu- 
-i  •  .len*.  1  >,al  ly  to  P  on  all  inputs.  An  equi- 


v.«  1 .  *i  I  *  jii. •  .ii. not  be  pr.Uii.ed  by  a  type  / 
ihange,  si. ice  if  a  type  2  change  is  made  the  set. 
t  f  values  satisfy  the  unditions  of  the 

>Mginai  t  • .  J  •  -  ;  *otul.y  lisjoint  fi-,m  that 
which  s  f  r-fy  t  h*  altere.j  r  j  1  e  .  Notice  also 
t iuu  ,  by  t h*  assumption  i I  detectability,  n >  type 
4  ci.<ing**  .«n  produce  a  equivalent  mutant.  If 
any  equivalent  m  tarit  ljj.  he  produced  by  .1  ty^e 
1  chang**,  1 «  t  f  be  th;-»  mutant  and  repeat  the 
previous  :  r  eduro.  After  j  bounded  number  of 
such  it*  .  1  -ns,  we  tf).  1  •  hive  program  F'  equi¬ 

valent  n.  il:-'  original  sin  h  that  the  only  equi¬ 
valent  imp  *  t  .s  of  P  are  pio<Ju<-ed  by  a  type  3 
*  h.n.u*  . 

Having  accomplished  the  procedure  described 
it.  the  jrevious  paragraph,  we  const  rut  a  te  »t 
.«  i  T  •  1.  h  that  every  mutant  that  is  not  equi¬ 
valent  t-  F’  differs  on  at  Jedbt  one  data  point 
<1  in  T  .  We  shall  have  more  to  say  about  the 
construction  of  T  in  section  IV.  Such  a  test 
set  is  cal  It'd  adequate  in  [ <? J  . 

We  will  now  consider  what  the  construction 
of  such  n  lest  set  tells  us  about  P 

IV.  CORRECTNESS  PESULTS 

In  this  section  we  will  denote  the  decision 
table  }  Ingram  under  consideration  by  P  .  We 
denote  by  P1  the  1 th  rule,  that  is,  the  cross- 
section  <f  the  condition  and  action  matrices 
taken  along  the  ith  column. 

For  I  levity  we  will  state  the  following  con¬ 
ditions  on;e.  They  arc  assumed  to  hold  in  all 
theorems  4 1 ven . 

1)  F  is  consistent,  although  it  need  not 
be  complete. 

2)  The  only  mutants  equivalent  to  P  are 
pi oduced  by  a  type  3  change . 

3)  Each  element  d  of  the  test  set  T 
sutisfi's  a  rule,  and  the  results  of  P 
■  u  d  are  correct  . 

LEMMA  l:  For  each  rule  PA  m  P  ,  there  exists 
a  data  point  d  in  *i  satisfying  tile  conditions 
of  Pi  . 

PROOF:  If  we  assume  to  the  ontrary  that  no  data 

item  satisfies  some  rule,  then  the  action  por¬ 
tion  of  that  rule  can  be  mutated  in  any  fashion 
with  no  perceptible  change,  contradicting  the 
assumption  concerning  the  construction  of  t  . 

THEOREM  l:  No  program  P'  m  S  that  differs 
from  P  by  at  least  one  type  4  change  can  evalu¬ 
ate  correctly  on  each  data  point  in  T  . 

PROOF:  The  proof  is  a  simple  pigeon-hole  argu¬ 

ment.  by  Lonvna  I*  there  is  at  least  one  data 
item  that  executes  every  rule.  By  assumption, 
each  rule'b  actions  are  unique  and  detectable, 
hence  any  program  that  evaluates  correctly  or* 

T  must  contain  at  l^ast  the  m  action  parts  of 
V  .  But  1.0  program  in  S  can  contain  more 
rules,  her ce  the  result  follows. 

Th«’'  i"m  1  implies  a  one-to-one  correspon¬ 
dence  b*  t  w.*rn  rules  in  F  and  rules  in  any  other 
program  n.  5  that  evaluates  correctly  on  T  . 

We  will  un*  this  fact  implicitly  in  what  follows. 

LEMMA  :  Any  program  P‘  in  S  that  is  not 


34  ' 


105 


equ  iv.j  Icj  ?  t  i  tut  th«a*  -v.ii-Jd  tiy 

or  T  mu»t  .  obtain  a»  Uas?  or h  j»-  tlar,  by 
itself,  w'ul*i  produce  a  j.oii-equA  v.i leu i  rat  ant. 

PROOF;  Assume  that  P  '  and  P  differ  l.y 
changes  that,  by  tliemst-  J  v«*s  ,  construct  h  juivj  ■ 
lent  mutants.  By  construction,  we  oi  t  that  P’ 
and  P  liffer  by  type  i  changes.  But  the  fart 
that  these  are  equivalent  changes  implies  that 
within  each  rule  the  conjunction  it  the  condi¬ 
tions  P  and  P*  have  in  t'Onunon  is  sufficient 
to  imply  the  conditions  at  each  of  the  disputed 
points,  hence  P  must  be  equivalent  t.o  P*  . 

The  remainder  of  this  section  is  devoted 
to  showing  by  cases  that  there  cannot  be  any 
program  P*  in  S  that  is  not  equivalent  to 
P  but  that  evaluates  correctly  on  T  . 

Assume  we  have  a  program  P*  in  S  that 
is  not  equivalent  to  P  but  that  evaluates  cor¬ 
rectly  on  T  .  By  Lerrana  2,  there  must  be  at 
least  one  change  between  P  and  P’  that  if 
made  to  P  would  produce  a  non-equivalent 
mutant.  Let  P*  be  the  mutant  so  formed,  with 
Pj  iP*  )  indicating  the  single  rule  that  has 
been  altered. 

Theorem  1  tells  us  that  the  change  could 
not  have  been  of  type  4;  the  next  three  theorems 
show  us  it  cannot  have  been  ot  types  1 ,  2  or  J 
either . 

THEOREM  c ;  The  difference  between  P  and  P* 
cannot  he  a  type  j  change. 

PROOF:  Assume  we  have  a  program  for  which  the 

hypothesis  holds  but  deny  the  conclusion.  Since 
P*  is  not  equivalent  to  P  there  exists  an 
element  d  in  T  such  that  P(d)  and  P*(d) 
differ.  But  this  can  only  happen  if  d  satis¬ 
fies  the  conditions  associated  with  Px  but  not 
those  of  P*L  .  But  one  can  see  then  that  no 
other  change  that  car.  be  made  to  P*1  will 
allow  d  to  satisfy  its  conditions.  Since 
P(d)  executed  the  correct  actions  and  no  other 
rule  have  the  same  actions,  a  contradiction  is 
obtained. 

THEOREM  i:  The  difference  between  P  i::J  P* 
cannot  be  a  type  2  change. 

PROOF:  Assume  as  before  we  have  a  program  tor 

which  the  hypothesis  is  trie  but  deny  the  con¬ 
clusion  .  By  Lemma  1,  t.here  exists  some  element 
d  in  T  that  satisfies  the  conditions  in  the 
original  rule.  bince  -1  cannot  possibly  satis¬ 
fy  any  rule  tnat  includes  tne  change  under  con¬ 
sideration,  the  fact  that  we  were  satisfied  with 
the  actions  of  P  on  d  and  no  oth*'i  rules  can 
have  these  same  actions  gives  us  a  contradiction. 

THEOREM  4;  The  difference  between  P  and  P* 
cannot  be  d  type  1  change. 

PROOF:  Assume  we  have  a  program  ton  wuich  the 

hypothesis  is  true  but  deny  the  conclusion.  By 
construction,  there  exists  an  element  d  m  T 
such  that  P  and  P*  differ,  but  this  an  only 
mean  that  d  satisfies  P*  ,  and  not  .q  . 

Since  we  were  satisfied  with  P(d)  there  must 
exist  at  least  one  more  change  of  type  r  3 


i-  tween  t  ,  and  tne  associated  rule  lr.  P‘ 
w:  ich  a  11  own  us  to  reject  d  in  F*  .  FurM.«u 
more,  this  change  cannot  produce  an  equivalent 
mutant.  but  using  Theorems  2  and  J  this  y*v«  . 
us  a  contradiction . 

Combining  Theorems  1-4  then  gives  us  the 
main  result  of  this  paper. 

THEOREM  5:  If  P  evaluates  correctly  on  T 
then  either  P  is  correct  or  it  is  radically 
incorrect . 

V.  THE  COMPLEXITY  OF  MUTANT  ANALYSIS 

I f  we  think  of  each  of  the  n  conditions 
as  dividing  the  space  of  possible  test  cases  m 
two,  there  are  then  possibly  2n  potential 
categories  a  test  case  could  fall  into.  A  test 
procedure  that  operated  by  explicitly  constru  - 
ting  a  representative  from  each  of  these  cate 
gories  might  then  require  an  exponential  number 
of  test  cases  (in  the  size  of  the  matrix) .  We 
shall  see  that  mutation  analysis  requires  sig¬ 
nificantly  fewer  test  cases. 

It  is  not  difficult  to  see  that  if  it  is 
possible  to  differentiate  a  program  P  from  «» 
mutant  P*  ,  then  it  is  possible  to  differen¬ 
tiate  it  with  a  single  test  case.  Since  Lheii 
are  at  most  2nm+np  mutants,  we  have  the  fol 
lowing  theorem. 

THEOREM  6:  If  for  a  given  program  P  there 
exists  an  adequate  test  set  T  with  respect  • 
the  mutant  operations,  then  there  exists  a  test 
set  T'  with  no  more  than  2nm+np  elements 
that  is  also  adequate. 

The  theorem  is  strengthened  by  the  fact 
that  the  constructive  method  of  mutant  testing, 
that  is,  choosing  a  mutant  and  finding  a  test 
case  to  eliminate  it,  results  in  a  test  set  of 
no  more  than  the  indicated  size.  Furthermore, 
it  is  probable  that  the  test  set  w*ll  be  much 
smaller,  since  empirical  evidence  suggests  that 
a  single  test  case  may  eliminate  a  large  number 
of  mutants  ( 3] . 

VI.  CONCLUSIONS 

In  this  paper  it  has  been  shown  that  by 
using  mutation  analysis  a  relatively  small  se*: 
of  test  cases  (linear  in  the  size  of  the  deci¬ 
sion  table,  versus  exponential  for  explicit 
enumeration)  can  be  used  to  infer  a  very  stros  •: 
conclusion  concerning  the  correctness  of  a  pi  - 
gram.  We  can  show,  using  our  methods,  that  l* 
a  program  satisfies  these  test  cases  then  if  -t 
is  incorrect,  it  is  incorrect  in  a  very  drama? : 
fashion,  and  it  may  be  possible,  using  other 
methods  (say  specification),  to  insure  that 
this  is  not  the  case. 

This  result  is  in  striking  contrast  to  t:.« 
usual  view,  which  holds  that  testing  is  of  el- 
most  no  help  in  showing  a  program  correct..  Jr: 
view  of  the  complex  problems  associated  with 
program  proving,  we  fell  it  makes  good  economic 
sonte  to  investigate  the  capabilities  of  test.:- 
performed  in  a  systematic  and  rational  fashui.. 

These  results  suggest  a  paradigm  for  re- 


14H 


at 


106 


•  cuveh  in  othei  inoucia  ct  \  i ogrumnung .  I'ossi  • 

;  '.is  ties  for  future  work  are  linear  recurrences 
i! ISP  type  functions),  or  partial  recursive 
t  auctions . 

On  the  other  hand,  in  12 J  a  number  of  em¬ 
pirical  observations  of  FORTRAN  programs  ate 
luv.-n  that  fortify  t-hc  hope  that  a  theory  dlonq 
the  lines  of  the  one  presented  here  might  be 
developed  for  that,  problem  domain. 

Taken  all  together,  this  data  suggests  that 
in  the  future  mutation  ana) > sis  may  become  an 
important  new  tool  in  the  field  of  program 
testing . 

ACKNOWLEDGEMENTS:  We  would  like  to  thank  Fred 

Wayward  for  a  careful  reading  of  an  earlier 
draft,  and  Richard  Ladner  for  helping  to  sim- 
;  1 L y  some  of  the  theorems. 

11 J  A.  Aho,  J.  Hopcroft  and  J.  L’llman. 

’.'he  lit. sign  and  Analysis  of  Computer 
Algorithms . 

Addison-Wesley ,  Reading,  Mass.,  3976. 

R.  DeMillo,  R.  Lipton  and  F.  Sayward. 
PHOCRAM  MUTATION:  A  Method  of  De termini ng 
Test  Data  Adequacy. 

To  be  presented  at  the  ONR  and  Navy  Compu¬ 
ting  Labs  workshop  on  Software  Technol¬ 
ogy  Transfer  (April  1978) . 

.  1)  R.  DeMillo,  R.  Lipton  and  F.  Sayward. 

"Hints  on  Test  Data  Selection." 

To  appear  in  Computer ,  Vol .  LL(4) ,  April 
1978. 

!-U  J . B .  Goodenough  and  S.L.  Gerhart. 

"Towards  a  Theory  of  Test  Data  Selection." 
IEEE  Tr\2K ,  on  Software  Engineering ,  SE-1,2 
(June  1975) . 

('.  )  M.  Montalbano. 

Decision  Tables. 

Science  Research  Associates,  Chicago,  1974. 

V]  S.L.  Pollack,  H.T.  Hicks  and  W.J.  Harrison. 
Decision  Tables:  Theory  and  Practice . 

John  Wiley  Sons,  New  York,  1971. 

(  7 J  R. T.  Yeh  (ed.). 

Current  Trends  in  Procp'arming  Methodology , 
Vol.  II. 

Prentice-Hall,  Englewood  Cliffs,  N.J., 

1977. 


This  work  was  partially  supported  by  ONR  Grant 
N00014-75-C-0752  and  NSF  Grant  MCS-76-8148b . 


349 


107 


PROVING  LISP  PROGRAMS  USING  TEST  DATA 

Timothy  A.  Budd 

Richard  J .  Ligton 

Yale  University 

Department  of  Computer  Science 
New  Haven,  Ct. 

and 

University  of  California  at  Berkeley 
Computer  Science  Division 
Berkeley,  Calif. 


1  .  INTRODUCTION 

An  idea  proposed  in  [1]  is  the  concept  of  pro¬ 
ving  individual  programs  correct  with  respect  to 
seme  larger  class  of  programs.  That  is,  instead  of 
proving  a  program  correct  we  prove  that  either  a) 
the  program  is  correct,  OR  b)  no  program 


in  this 


108 


class  realizes  the  intended  function.  It  is  assumed 
that  most  programmers  at  least  know  if  the  function 
they  are  trying  to  compute  can  be  realized  in  some 
large  class  of  programs,  and  therefore  from  a 
theoretical  point  of  view  the  introduction  of  this 
disjunction  may  make  the  task  of  validating  programs 
vastly  easier. 

A  previous  paper  has  analysed  programs  written 
in  a  decision  table  format  [4].  In  this  paper  we 
will  be  concerned  with  lisp  programs  composed  of 
CAR,  CDR  and  CONS  with  lisp  predicates  composed  of 
CAR,  CDR  and  ATOM.  Similar  classes  of  programs  have 
been  studied  in  [5,6,7]. 

Associated  with  each  S-Expression  X  we  can  con¬ 
struct  a  binary  tree  as  follows:  Consider  the  infin¬ 
ite  binary  tree  where  each  left  arc  is  marked  CAR 
and  each  right  arc  CDR  (call  this  the  complete 
CAR/CDR  tree.)  Starting  with  X  at  the  root  of  the 
tree,  travel  down  each  arc  in  turn  taking  the 
appropriate  CAR  or  CDR.  Prune  the  complete  tree  each 
time  you  reach  an  atom.  The  resulting  finite  binary 
tree  will  be  called  the  projection  of  X  (or 
PR0J[X]).  An  example  is  shown  in  figure  1.  Notice 
the  PR0J[X]  is  a  representat ion  of  the  structure  of 
X,  and  in  invariant  under  the  renamings  of  the  atoms 


of  X. 


109 


We  can  iefine  a  relation  <  as  follows.  Given 
two  S-expressions  X  ana  Y  we  will  say  X  <  Y  if 
FR0J[x]  is  the  intersection  of  PR0J[X]  and  PR0J[Y]. 
Using  tnis  relation  one  can  show  the  set  of  lisp 
structures  form  a  lattice.  (The  proofs  can  be  adap¬ 
ted  from  SummersC?],  although  he  defines  the  projec¬ 
tion  slightly  differently.) 


We  will  make  the  convention  that  all  S- 
Expressions  (we  will  use  the  less  clumsy  expression 
point  )  have  unique  atoms.  Certainly  if  two  programs 
agree  on  all  such  points  they  must  agree  on  all 
inputs.  Hence  we  can  do  this  without  loss  of  gen¬ 
erality. 


We  will  call  a  lisp  program  a  Selector  program 
if  it  is  composed  of  just  CAR  and  CDR.  We  will  call 
it  a  Straight  line  program  if  it  is  a  selector  pro¬ 
gram  or  is  formed  by  CONS  on  either  selectors  or 
other  straight  line  programs.  We  will  call  it  a 
Predicate  progr am  if  it  has  the  following  form 
C0ND(AT0M(Gi(X) )  ->  P1 (X) 

T  ->  P2(X)  ) 

Where  the  G’s  are  selectors  and  the  P's  are  straight 
line  programs  or  other  predicate  programs. 


Assume  we  have  a  function  F  which  we  know  can 
:e  computed  oy  a  program  in  some  schemata  class  S. 


no 


We  have  a  program  P  in  S  which  we  wish  to  show  com¬ 
putes  F.  We  assume  we  have  some  method  of  verifying 

that  P(X)=F(X)  on  a  finite  number  of  test  cases  (say 
by  hand  calculation.)  We  wish  to  show  that  there 
exists  a  finite  set  of  test  cases  T  such  that  if  P 

correctly  computes  F  on  every  element  of  T  then 

either  1)  P  correctly  computes  F  for  all  inputs,  or 
2)  no  program  in  the  schemata  class  S  correctly  com¬ 
putes  F.  This  goal  is  similar  to  that  of  mutation 
analysis  [ 1 -4 ] . 

Call  such  a  test  set  Adequate . 

We  then  wish  to  discover  conditions  under  which 
we  can  construct  adequate  test  data. 

2.  STRAIGHT  LINE  PROGRAMS 

We  will  say  a  program  P(X)  is  Well  if 

for  every  occurrence  of  the  construction  CONS(A,E) 
it  is  the  case  that  A  and  B  do  not  share  an  immedi¬ 
ate  parent  in  X.  The  intuitive  idea  of  the  defini¬ 
tion  should  be  clear:  a  program  is  well  formed  if  it 
is  not  doing  any  more  work  then  it  needs  to.  Notice 
that  being  well  formed  is  an  observable  property  of 
programs,  independent  of  testing. 

We  can  define  a  measure  of  the  complexity  of 
straight  line  programs  by  their  CONS-depth,  where 


377 


Ill 


CONS-depth  is  defined  as  follows: 

1)  The  CONS-depth  of  selector  function  is  zero. 

2)  The  CONS-depth  of  a  straight  line  program  PCX) 
=  CONS,P1  (X)  ,P2(X)  )  is  1-kMAXC  CONS- 
depth ( P 1 ( X ) ) ,  CONS-depth(P2) ) ) . 

LEMMA  1:  If  any  two  selector  programs  compute 
identically  on  any  point  X,  they  must  compute 
identically  on  all  points. 

PROOF:  The  only  power  of  a  selector  program  is 
to  choose  a  subtree  out  of  its  input  and  return 
it.  We  can  view  this  process  a  selecting  a 
position  in  the  complete  CAR/CDR  tree  and 
returning  the  subtree  rooted  at  that  position. 
Since  there  is  a  unique  path  from  the  root  to 
this  position,  there  is  a  unique  predicate 
which  selects  it  out.  Since  atoms  are  unique  by 
merely  observing  the  output  we  can  infer  the 
subtree  which  was  selected.  The  result  then 
follows . 


LEMMA  2:  If  two  well  formed  programs  compute 
identically  on  any  point  then  they  must  have 
the  same  CONS-depth. 

PROOF:  Assume  we  have  two  programs  P1  and  P-, 
and  a  point  X  such  that  P^X)  =  ?2(X)  yet  the 
CONS-depthC P1 )  <  CONS-depth ( P. )  .  This 


then 


implies  that  there  is  at  least  one  subtree  in 
the  structure  of  which  was  produced  by  CON- 
Sing  two  straight  line  programs  while  the  same 
subtree  in  P^X)  was  produced  by  a  selector. 
But  then  the  objects  CONSed  must  have  an 
immediate  ancestor  in  X,  contradicting  the  fact 
that  P-,  is  well  formed. 

THEOREH  1:  If  two  well  formed  straight  line 
programs  agree  on  any  point  X  then  they  must 
agree  on  all  points. 

PROOF:  The  proof  will  be  by  induction  on  the 
CONS-depth.  By  lemma  2  any  two  programs  which 
agree  at  X  must  have  the  same  CONS-depth.  By 
lemma  1  the  theorem  is  true  for  programs  of 
CONS-depth  zero.  Hence  we  will  assume  it  is 
true  for  programs  of  CONS-depth  n  and  show  the 
case  for  n+1 . 

If  program  P^  has  CONS-depth  n+1  then  it  must 
be  of  the  form  CONS(  P  ^  1  ,  P  ^ )  where  and  P12  have 
CONS-depth  no  greater  then  n.  Assume  we  have  two 
programs  P1  and  P2  in  this  fashion.  Then  for  all  Y: 

P 1 ( Y )  =  P2(Y)  IFF 

CONS(Pn(Y)  ,P12(Y))  =  CONS(P21  (Y)  ,P22(Y)  )  IFF 
Pn(Y)  =  P21CY)  and  P12(Y)  =  P22(Y) 

Hence  by  the  induction  hypothesis  P1  and  P2 


113 


must  agree  for  all  Y. 

We  define  a  test  point  to  Generic  if  by  itself 
it  constitutes  an  adequate  test  set  as  defined  in 
the  introduction. 

Corollary:  For  any  well  formed  straight  line  lisp 
program,  and  unique  atomic  point  for  which  the  func¬ 
tion  is  defined  is  generic. 

3  ■  PREDICATE  PROGRAMS 

We  can  view  the  structure  of  a  predicate  pro¬ 
gram  as  a  binary  tree.  Associated  with  each  interior 
node  is  a  predicate  and  associated  with  each  leaf  is 
a  straight  line  program  (3ee  figure.) 

We  will  call  a  predicate  program  Well  formed  if 

1)  each  of  the  straight  line  programs  associated 
with  each  leaf  are  well  formed,  and 

<2 )  for  each  leaf  on  the  space  of  all  possible 
inputs  there  is  at  least  one  item  which  passes 
all  conditions  leading  to  that  leaf  and  causes 
the  associated  straight  line  program  to  be  exe¬ 
cuted  . 

Notice  that  whether  a  program  is  well  formeo  or 
not  is  an  observable  fact  independent  of  testing. 


380 


114 


For  notation  we  will  denote  the  leaves  going 
from  left  to  right  by  1 ^  i=1,..n.  Let  e ^  i=1,..n  be 
the  set  of  straight  line  programs  associated  with 
the  leaves.  We  will  assume  that  for  no  i,j  i*j  is  it 
the  case  that  e^  is  equivalent  to  e  . .  Notice  again 
theorem  1  gives  us  an  effective  method  to  test  this. 

Given  a  well  formed  predicate  program  P  is  S  we 
construct  a  set  of  n  data  points  d  ^  ,  ...  dn  such 
that  follows  the  path  to  leaf  1^  and  executes  the 
program  e^  correctly.  Call  this  set  T  1 .  There  is  an 
obvious  effective  procedure  to  generate  such  a  test 
set . 

LEMMA  3:  Given  any  well  formed  program  P  in  S  which 
evaluates  correctly  on  each  element  of  T  ,  at  least 
one  data  point  d^  in  T  must  exercise  every  straight 
line  leaf  program  in  P'. 

PROOF:  Assume  we  have  a  program  P'  satisfying  the 
hypothesis  but  for  which  the  conclusion  is  false.  By 
the  pigeon  hole  principle  there  must  be  at  least  two 
points  d^  and  d ^  which  were  evaluated  by  different 
leaves  in  P  but  which  are  evaluated  by  the  same  leaf 
in  P'.  Let  f  denote  the  straight  line  program  which 
evaluates  these  points  in  P'.  Since  the  d  points  are 
generic  this  implies  that  e^^  is  equivalent  to  f.  But 
also  ei  is  equivalent  to  f.  Hence  e^  must  be 
equivalent  to  e.  which  is  a  contradiction. 

■5P1 


115 


Corollary:  Giver,  any  well  forced  program  P'  in  S 
which  evaluates  correctly  on  each  element  of  T,  the 
leaf  programs  of  P‘  are  simply  a  permutation  of 
those  of  P. 


It  might  seem  that  exercising  all  the  paths  of 
P1  is  sufficient  to  show  it  is  equivalent  to  P.  But 
this  is  not  the  case.  We  might  simply  have  con¬ 
sistently  chosen  the  right  path  for  the  wrong 
reason.  To  rule  out  this  possibility  requires  a  more 
stringent  set  of  test  cases.  We  construct  this  test 
set  in  the  following  manner. 


For  each  leaf  1^  and  for  each  element  d^  in  T ^ 
construct  a  point  d ^  in  the  following  way.  Consider 
the  infinite  CAR/CDR  tree.  color  each  point  RED 
wnich  i3  tested  and  found  to  be  atomic  on  the  path 
leading  to  the  leaf  1 ^ .  Color  the  points  which  are 
tested  and  found  to  be  non  atomic  BLUE.  As  long  as 
it  is  not  contained  in  a  subtree  rooted  at  a  red 
point  and  does  not  contain  a  blue  point  in  its  sub¬ 
tree,  color  a  point  red  if  it  is  atomic  in  d ^ .  AS 
long  as  it  is  not  contained  in  a  subtree  rooted  at  a 
red  point,  color  a  point  blue  if  it  is  not  atomic  in 
d..  d -  1  is  tnen  the  smallest  unique  atomic  point 
wnere  the  red  colored  vertexes  are  atomic  and  the 
olue  vertexes  non  atomic. 


1 

I 


* 


116 


Denote  by  T  the  set  T ^  augmented  with  these 
points . 

THEOREM  2:  Any  well  formed  program  P'  in  S  which 
agrees  with  P  on  T  must  agree  with  P  on  all  points. 
PROOF:  Assume  we  have  a  program  P  which  satisfies 
the  hypothesis,  yet  there  is  a  point  X  such  that 
P  ( X )  and  P ' ( X  )  differ. 

The  point  X  must  be  evaluated  by  some  leaf  1 ^ 
in  P,  hence  it  must  satisfy  all  the  constraints 
associated  with  that  leaf. 

This  point  is  also  evaluated  by  a  leaf  program 
e,  in  P’.  By  lemma  6  some  data  item  d  .  in  T  also 

K  J 

executes  this  leaf  program.  This  implies  that  no 
matter  what  the  constraints  are  on  this  path  in  P' 
(and  we  make  no  assumptions  about  what  they  might 
be)  they  cannot  interfere  with  the  constraints  along 
the  path  leading  the  1^. 

But  this  then  necessarily  implies  that  point 

d.  •  would  be  evaluated  by  e.  in  P  and  e.  in  P'  where 
1  J  IK 

k  i  to  i.  Since  d^  is  also  generic  using  the  ear¬ 
lier  tneorems  a  contradiction  is  obtained. 

Corollary:  There  is  an  effective  procedure  to  con¬ 
struct  an  adequate  test  set  for  predicate  programs. 


313 


117 


4.  RECURSIVE  PROGRAMS 

We  will  define  a  class  of  programs  (dn  )  as 
follows : 

The  input  to  the  program  shall  consist  of  two  sets 
of  variables:  Selector  variables  ,  denoted  ... 

and  Constructor  variables  ,  denoted  y  1  ,  ...yp. 
a  progr am  will  consist  of  two  parts,  a  program  body 
and  a  recurser . 

A  program  body  consists  of  n  statements,  each  sta¬ 
tement  composed  of  two  parts.  The  first  part  is  a 
Predicate  of  the  form  ATQM(tCx^))  where  t(x^)  is  a 
selector  function  and  a  selector  variable.  The 

second  part  is  a  straight  line  output  function  over 
the  selector  and  constructor  variables. 

A  recurser  is  divided  into  two  part3.  The  construc¬ 
tor  part  is  composed  of  p  assignment  statements  for 
each  of  the  p  constructor  variables  where  y ^  is 
assigned  a  straight  line  function  of  the  selector 
variables  and  y^.  The  selector  part  is  composed  of  m 
assignment  statements  for  the  m  selector  variables 
so  that  x ^  is  assigned  a  selector  function  of 
itself.  The  following  diagram  should  give  a  more 
intuitive  picture  of  this  class  of  programs. 

Program  P  (  x  1 - xm  ,  y  1 . y  )  = 

( xi  i )  fi(xr  ••• 


118 


p2(xi2)  ">  f2(xl"--xm'yi"--yp) 

Pn(xin)  0  fn(xl"--xm’y1"-- V 
y1  <*  *1(yl’x1-‘--xm) 

yp  <"  8p^p-xl.*--xm) 
x 1  <-  h1 ( x1  ) 

x  <-  h  (  x  ) 
m  mm 

Given  such  a  program,  execution  proceeds  as  follows: 
Each  predicate  is  evaluated  in  turn.  If  any  predicate  is 
undefined  so  is  the  result  of  the  execution,  otherwise  if 
any  predicate  is  TRUE  the  result  of  execution  is  the 
associated  output  function.  Otherwise  if  no  predicate 
evaluates  true  then  the  assignment  statements  in  the 
recurser  and  constructor  are  performed  and  execution  con¬ 
tinues  with  these  new  values. 

We  will  say  a  variable  is  a  predicate  variable  if  it 
is  tested  by  at  least  one  predicate.  Similarly  it  is  an 
output  variable  if  it  is  used  in  at  least  one  output 
function.  A  variable  can  be  both  a  predicate  and  an  out¬ 
put  variable. 

We  will  make  the  following  restrictions  on  the  pro¬ 
grams  we  will  consider: 

1)  every  recursion  selector  and  every  constructor  must  be 


non  trivial. 


2)  every  variable  is  either  a  predicate  or  an  output 
variable . 

3)  there  is  at  least  one  output  variable 

4)  (freedom)  for  and  1<k<n  and  1>0  there  exists  a  set  of 
inputs  which  cause  the  program  to  recurse  1  times  before 
correctly  exiting  by  output  function  k. 

5)  each  output  function  is  unique. 

6)  every  constructor  variable  appears  totally  in  at  least 
one  output  function. 

Given  a  program  P  in  ,  let  0  be  the  union  of 
for  i:1,n. 

Let  us  assume  we  know,  on  independent  grounds,  that 

* 

a  correct  program  P  exists  in  &,  furthermore  that  no 

» 

predicate,  output  function,  selector  or  constructor  in  P 
has  a  depth  greater  then  some  constant  u>3. 

GOAL:  We  wish  to  construct  a  set  of  test  inputs  with  the 
property  that  any  program  P  in  d  which  executes  correctly 
on  these  values  must  then  be  equivalent  to  P*.  The 
existence  of  such  a  test  set  would  then  imply  (under  the 
assumption  that  at  least  one  correct  program  exists  in  0) 
that  P  is  correct. 

We  will  use  capital  letters  from  tne  ena  of  the 
alphabet  (X,  Y  and  Z)  to  represent  vectors  of  inputs. 


120 


Hence  we  can  refer  to  P(X)  rather  then 

P  (  x  1  ,  .  .  .  ,  xm , y 1 , . . .  , yp  )  .  Similarly  we  can  abbreviate  tne 

simultaneous  application  of  constructor  functions  by  C  ( X 
and  recursion  selectors  by  S(X). 

We  will  use  the  initial  greek  letters  to  represent 
positions  in  a  variable,  where  a  position  is  defined  by  a 
finite  CAR-CDR  path  from  the  root.  When  no  confusion  can 
arize  we  will  frequently  refer  to  "position  d  in  X 
whereby  we  mean  position  d  in  some  x^  in  X. 

We  can  form  a  lattice  on  the  space  of  inputs  by  say¬ 
ing  X  <Y  if  and  only  if  for  all  selector  variables  in 
X  are  smaller  then  their  respective  variables  in  Y,  ana 
similarly  the  constructor  variables. 

We  can  define  the  notion  of  ’Pruning  X  at  position 
d as  follows:  We  will  say  Y  is  X  '  pruned  at  position  d 
if  Y  is  the  largest  input  <X  where  d  is  atomic.  This  pro¬ 
cess  can  be  viewed  as  simply  taking  the  subtree  m  X 
rooted  at  d  and  replacing  it  by  a  unique  atom. 

If  a  position  d  (relative  to  the  original  input)  is 
tested  by  some  predicate  we  will  say  that  the  position  in 
question  has  been  touched. 

The  assumption  of  freedom  asserts  only  the  existence 
of  inputs  X  which  will  cause  us  to  recurse  a  specific 
number  of  times  ana  exit  by  a  specific  output  function. 


387 


Our  first  lemma  snows  that  this  can  be  made  constructive. 

LEMMA  1.  Given  1>  0  and  1  <  i  <  n  we  can  construct  an 
input  X  such  that  P(X)  is  defined  and  while  executing  X  P 
recurses  1  times  before  exiting  by  output  function  i. 
PROOF:  Consider  m-t-p  infinite  trees  corresponding  to  the 
m+p  input  variables.  Mark  in  BLUE  every  position  which 
is  touched  by  a  predicate  function  and  found  to  be  non- 
atomic  in  order  for  P  to  recurse  1  times  and  reach  the 
ith  predicate.  Then  mark  in  RED  the  point  touched  by  the 
1th  predicate  after  recursing  1  times. 

The  assumption  of  freedom  implies  tnat  no  blue  ver¬ 
tex  can  appear  m  the  infinite  subtree  rooted  at  the  re a 
vertex,  and  that  the  red  vertex  can  not  also  be  marked 
blue. 

Now  mark  in  YELLOW  all  points  which  are  touched  by 
constructor  functions  in  recur3ing  1  times,  and  each 
position  touched  by  the  ii'n  output  function  after  recur¬ 
sing  1  times.  The  assumption  of  freedom  again  tells  us 
shat  no  yellow  vertex  can  appear  in  the  infinite  subtree 
rooted  at  the  red  vertex.  The  red  vertex  may,  however, 
also  be  colored  yellow,  as  may  tne  blue  vertexes.  It  is 
a  simple  matter  to  then  construct  an  input  X  such  that 

1)  all  BLUE  vertices  are  non  atomic  in  X, 

2)  The  RED  vertex  is  atomic,  and 

D  all  YELLOW  vertexes  are  contained  in  X  (tney  may  be 


atomic ) 


It  is  trivial  to  verify  that  such  an  X  satisifies 
our  requirements.  A 

Notice  that  the  procedure  given  in  the  proof  of 
lemma  1  allows  us  to  find  the  smallest  X  such  that  the 
indicated  conditions  hold.  If  d  is  the  position  touchec 
by  the  itn  predicate  after  recursing  1  times  call  this 
point  the  minimal  d  point,  or  . 

Freedom  implies  no  point  can  be  twice  touched,  hence 
the  minimal  d  point  is  a  well  defined  concept. 

Given  an  input  X  such  that  P ( X )  is  defined,  let 
FX(Z)  be  the  straight  line  function  such  that  FX(X)  = 
PCX).  Note  that  by  the  property  of  being  generic,  Fx  is 
defined  by  this  single  point. 

LEMMA  2:  For  any  X  for  which  PCX)  is  defined,  we  can  con¬ 
struct  an  input  Y  with  the  properties  that  PCY)  is 
defined,  Y  >  X  and  Fx  i  Fy. 

PROOF:  There  exist  some  constants  1  and  i  such  that  on 
input  X  P  recursed  1  times  before  exiting  by  output  func¬ 
tion  i.  Let  the  predicate  P.  test  variable  x.  and  let  s . 

J  J 

be  the  recursion  selector  for  this  variable. 

There  are  two  cases,  depending  upon  whether  the  out¬ 
put  function  f^  is  constant  or  not.  If  f ^  is  not  a  con- 


123 


s  t  ant  tnen  since  X  is  bounded  there  .must  be  a  minimal  k  > 

‘[s 

1  such  that  the  predicate  p.  (s  (x.))  is  undefined. 

i  j 

3y  lemma  1  we  can  find  an  input  1  which  causes  P  to 
recurse  k  times  before  exiting  by  output  function  i.  Let 
Y  =  X  union  Z.  Since  Y  >  Z  P  must  recurse  at  least  as 
much  on  Y  as  it  did  on  Z.  Since  the  final  point  tested  is 
still  atomic  P(Y)  will  recurse  k  times  before  exiting  by 
output  function  i. 

It  is  simple  to  verify  the  fact  that  F^iF^. 

The  second  case  arises  when  f^  is  a  constant  func¬ 
tion.  By  assumption  6  there  is  at  least  one  output  func¬ 
tion  which  is  not  a  constant  function.  Let  f^  be  this 

function.  Let  the  predicate  test  variable  x j .  The  same 

argument  as  before  goes  through  with  the  exception  that 
is  may  happen  by  chance  the  P(Y)  =  PCX)  (i.e.  PCY) 
returns  the  constant  value.)  In  this  case  we  increment  k 
by  1  and  perform  the  same  process  and  it  cannot  happen 
that  PCY)  =  PCX) .  A 

LEMMA  3 :  If  P  touched  a  location  d ,  then  we  can  construct 
two  inputs  X  and  Y  such  that  PCX)  and  PCY)  are  definec, 

ana  for  any  P'  in  &.  if  PCX)  =  PCX)  and  PCY)  =  P  CY) 

then  P'  must  touch  d . 

PROOF:  Let  Z  be  the  minimal  d  point.  By  lemma  2  we  can 
construct  an  input  X  such  that  PCX)  is  defined,  X  >  Z  and 
Fx  t  F,.  Let  Y  be  X  pruned  at  d  . 


124 


We  first  assert  that  P(Y)  is  defined  and  =  F,.  To 
see  this  we  note  that  every  point  which  was  tested  by  P 
is  computing  P(Z)  and  found  to  be  non  atomic  is  also  non 
atomic  in  Y.  d  is  atomic  in  both,  and  if  the  output  func¬ 
tion  was  defined  on  Z  then  it  must  be  defined  on  Y  which 
is  strictly  larger. 


Now  suppose  there  existed  some  program  P'  such  that 
P'(X)  and  P  (Y)  were  computed  correctly  but  P'  did  not 
touch  d ■  We  see  immediately  that  this  cannot  happen  since 
all  other  positions  are  either  the  same  in  X  and  in  Y  or 
they  exist  in  X  but  not  in  Y.  Hence  if  P'(Y)  is  defined 
it  would  imply  Fx  =  Fy  ,  a  contradiction.  A 

Define  the  positions  which  P  touches  without  going 
into  recursion  to  be  the  primary  positions  of  P. 


Given  a  program  P  to  test  our  first  task  is  then  to 
construct  a  set  of  test  inputs  using  theorem  1  which 
demonstrate  that  each  of  the  primary  positions  must  be 
touched . 


Observe  that  this  set  contains  at  most  2n  elements. 


We  will  say  a  selector  function  f  factors  a  selector 
function  g  if  g  is  equivalent  to  f  composed  with  itself 
some  number  of  times.  For  example  CADR  factors  CADADADR. 
we  will  say  that  f  is  a  simple  factor  of  g  if  f  factors  g 
and  no  function  factors  f,  other  then  f  itself. 


391 


125 


We  now  construct  a  second  set  of  data  points  in  the 
following  fashion: 

For  each  selector  variable  x^: 

1)  x^  is  an  output  variable  used  in  output  function  f ^ . 
Let  d  be  the  position  first  tested  by  p^  after  PCX)  has 
recursed  to  a  depth  of  at  least  u^.  Then  we  generate  the 
minimal  <d  po in t . 

2;  x.  is  not  an  output  variable,  but  is  a  predicate  van- 

sole.  Let  d  be  the  first  time  a  position  With  aeptn 

o 

greater  tnen  u“  is  touched  in  x^.  First  generate  the 
minimal  d  point,  then  using  lemma  3  generate  two  inputs 
which  demonstrate  that  position  d  must  be  touched. 

Notice  that  we  have  added  no  more  then  3^i  points. 

THEOREM  1:  If  ?'  is  in  5  and  P  computes  correctly  on  ail 
data  points  computed  so  far,  then  the  recursion  selectors 
of  ?'  must  be  powers  of  a. . 


~Q? 


126 


PROOF:  Observe  the  fact  that  if  xi  is  an  output  variable 
in  P,  it  must  appear  as  a  result  in  at  least  one  input  X 
in  our  test  data  space,  hence  if  P  CX)  is  correct  x.^  must 
be  an  output  variable  for  P’  also. 

The  proof  of  theorem  1  will  then  rest  on  the  fol¬ 
lowing  two  cases. 

Case  1.  If  x^  is  an  output  variable.  By  construction 

there  exists  some  X  in  our  test  data  space  such  that  P(X) 

2 

recurses  to  a  depth  of  at  least  3d  (<U  )  before  exiting 
by  the  jth  output  function,  where  x^  is  an  output  varia¬ 
ble  in  fj. 

u 

Assume  that  the  i&  recursion  selector  in  P'  is  not 
a  power  of  cr^ .  Then  somewhere  before  the  ith  variable  has 
recursed  to  a  depth  of  u  their  paths  must  diverge. 

Once  the  ith  variable  steps  past  the  points  where 
the  paths  in  the  two  programs  diverge  it  can  never  have 
access  to  the  subtrees  used  in  P  by  f ^  in  its  output. 
Hence  P'  on  X  must  halt  before  the  i^  variable  has 
recursed  to  a  depth  of  u. 

But  if  that  is  the  case  then  its  output  functions 
cannot  access  subtrees  rooted  any  deeper  then  2u .  By  con¬ 
struction  the  correct  output  requires  trees  which  can 
only  be  accessed  by  going  at  least  3d  deep,  hence  a  con¬ 
tradiction  is  obtained. 

Case  2:  If  x^  is  not  used  as  an  output  variable. 


TO  "5 


1  ?/ 


Assume  the  recursion  selector  of  x  m  P  is  not  a  power 
t>f  cr  .  Then  once  tne  vanaoies  x,  r.ave  recursed  past  the 
depth  u  they  will  oe  in  totally  different  suotrees  of 
tneir  input  (see  figure  3-) 

By  construction  it  is  required  that  P'  touch  a  point 
whose  depth  _s  at  least  3u.  P’  must  therefore  touch  this 

t*  h 

point  before  the  i  variable  diverges  from  the  patr. 
ta.-cen  by  ? ,  r.snce  :  afore  it  . ;  a  s  h  as  -eached  a  deptn  of  u. 
out  oy  definition  P'  cannot  touch  any  points  deeper  tnen 
2u  in  this  regi'on,  hence  a  contradiction  is  obtained.  A 

Theorem  1  gives  us  a  way  to  demonstrate  tnat  a  pro¬ 
gram  Q  must  have  the  same  recursion  selectors,  up  to  a 
power ,  as  does  P.  We  now  wish  to  derive  a  slightly 
stronger  result.  We  will  show  tnat  t.nere  exists  a  con¬ 
stant  r  such  that  the  recursion  selectors  of  P  are 
exactly  sr  . 

Note  that  by  definition  we  Wno»»  that  ,Sr!  (that  is. 
tne  maximum  depth  of  any  function  in  Sr)  is  less  then  u. 

THEOREM  2:  If  P’  is  in  d  computes  correctly  on  all  the 
points  we  have  so  far  computed,  then  there  exists  a  con¬ 
stant  r  such  that  the  recursion  selectors  of  P  are 
exactly  Sr  . 


PROOF:  We  know  bv 
selectors  of  ?  must 


theorem  2  that  the  recursion 
be  powers  of  a,.  For  each  1  < i < m 


128 


construct  tMe  ratio  of  the  power  of  ov  in  P  to  that  of 
P.  Let  x  ^  be  the  variable  with  the  smallest  such  ratio 
ana  be  the  variable  with  the  largest.  From  the  fact 
that  these  ratios  are  different  we  will  obtain  a  contrad¬ 
iction  . 

Case  1:  x^  is  an  output  variable.  By  construction  there 
is  an  input  X  such  that  P  (X)  must  recurse  on  X  tc  a 
depth  of  at  least  u^  before  outputing  by  a  output  func¬ 
tion  which  uses  x^.  .  This  implies  that  P'  must  recurse  at 
least  u  times.  Since  in  comparison  to  the  program  P  the 
variable  x^  is  gaining  at  least  one  level  each  recursion 
we  nave  that  either  1)  P'(X)  is  undefined  because  x^  rar. 
off  the  end  of  its  input,  or  2)  P’(X)  must  halt  before  it 
has  recursed  to  a  depth  of  u(u-1)  in  xi  in  which  case  it 
cannot  have  produced  the  correct  output. 

The  argument  in  the  case  where  x^  is  a  predicate 
variable,  but  not  an  output  variable  is  almost  the  same 
and  is  here  omitted. 

By  lemma  3  we  know  that  if  P  touches  a  location  d , 
then  we  can  construct  a  pair  of  inputs  with  the  property 
that  any  program  P'  in  $  which  executes  correctly  on 
these  two  inputs  must  also  touch  d.  We  now  present  the 
converse  lemma. 

LEMMA  u:  If  P  works  correctly  on  the  test  data  so  far 
constructed,  ana  does  not  touch  a  location  d.  then  we  can 


395 


o  on  struct  two  in  puts  X  ana  Y  witri  tne  property  tnat  ary 
?'  in  S  whicn  executes  correctly  on  all  tms  data  must 
jlso  not  touch  tne  position  d. 

PROOF:  Let  x. ^  ae  the  variable  containing  d  .  Let  v  tr.e 
maximum  aeptn  any  variable  has  ootained  just  after  tne 
itn  recursion  selector  passes  tne  depth  of  d.  Let  X  oe  a 
set  of  complete  trees  of  depth  v+2u,  pruned  at  d. 


There  are  two  cases,  depending  upon  whetner  ?(X)  is 
defined  or  not. 

Case  1:  P(X)  is  not  defined.  Assume  P'  touches  d.  Let  I 
be  the  minimal  d  point  in  ?  (we  need  not  be  able  to  con¬ 
struct  thi3  point.)  We  see  tnat  Z<X.  But  this  then 
implies  that  P'(X)  must  be  defined,  a  contradiction. 


Case  2:  PCX)  is  defined.  By  lemma  1  we  can  construct  an 


input  Z>X  so  that 


Let  Y  be  Z  pruned  at  d . 


Assume  P(X)=P'(X)  and  ?(Y)=P'(Y)  and  ?'  touches  a. 
If  P(Y)  is  undefined  we  are  done,  since  P '(:)  muse  be 
defined.  So  assume  P(Y)  is  defined,  ..n  this  case,  since 
does  not  tcucn  d,  Fv=F7iF..  But  if  P  touched  d.  the,- 
since  x<Y  we  woulu  nave  Fx=Fy,  a  contradiction.  /\ 

Next  we  snow  that  the  primary  positions  of  ?'  must 
oe  exactly  those  of  P. 


Let  p  j o  be  an  ordering  of  the  primary  posi¬ 
tions  of  ?  sue r.  that  tr.e  depth  of  tne  position  tested  by 


the  deoth  of  that 


tested 


130 


We  know  the  recursion  selectors  of  ?'  are  Sr  where 
; Sr 1  <u.  This  gives  us  at  most  u  possiDilities.  For  eacn 
possibility  we  proceed  in  turn  as  follows: 

Assume  position  o ^  (1  =  1...  n)  is  not  primary  in 
P ' .  We  can  construct  a  point  which  is  then  tested  by  P 
earlier  then  o by  imagining  the  root  input  was  actually 
the  result  of  one  recursion,  and  then  looking  at  the 
position  o^  in  relation  to  tne  earlier  root  (see  figure 
4  .  ) 

Now  one  of  two  cases  arises.  Either 

1)  the  new  position  is  not  touched  by  P,  or 

2)  the  new  position  corresponds  to  a  position  o^  j<i. 

In  the  first  case  we  can  construct  two  inputs  which 
demonstrate  the  position  in  question  must  not  be  touched. 
Tne  second  case  immediately  rules  out  Sr  as  the  recursion 
selector,  since  by  induction  p^  is  primary  to  P  and 
hence  P'  would  not  by  an  element  of  5. 

Notice  we  have  increased  our  test  case  site  by  no 
more  then  2nu  elements.  The  resulting  test  case  then 
gives  us  the  following  theorem. 

THECFEM  3:  If  P  (X)  =  PCX)  for  X  in  our  test  set.  then 
the  primary  positions  of  P  are  exactly  those  of  P. 


131 


Notice  also  tnat  by  the  generic  property  tnat  tms 
also  implies  the  following  corollary: 

THEOREM  4:  The  output  functions  of  P  are  exactly  those 
of  P. 


Once  we  have  that  the  primary  positions  of  P  are 
exactly  those  of  P,  we  can  now  return  to  the  problem  of 
showing  that  the  selector  functions  of  P  must  oe  . 
Consider  each  of  the  alternative  possibilities  for  Sr  (no 
more  tnen  U  of  them.)  Since  the  rates  of  recursion  of  P 
and  P'  differ,  one  of  three  cases  must  arise.  Either 

1)  P’  touches  the  same  point  twice  (which  means  P'  is  not 
in  d  and  is  out  of  the  running.) 

2)  P'  touches  a  point  which  P  fails  to  touch,  or 

3)  P  touches  a  point  which  ?'  fails  to  touch. 

Since  we  only  need  to  test  for  the  last  two  condi¬ 
tions  we  need  augment  out  test  case  with  no  more  then  2u 
points. 

we  then  have  the  following  theorem: 

THEGRFM  5:  The  recursion  selectors  of  ?'  must  be  exactly 
those  of  P. 

Pushing  onward  we  next  want  to  consider  the  recur¬ 
sion  constructor s .  Once  we  have  the  other  elements  fixed, 
however,  the  constructors  are  almost  given  free.  All  we 

W 


need  do  is  to  construct  p  data  points  so  tnat  the  itr: 
data  point  causes  the  program  P  to  recurse  once  and  exit 
using  an  output  function  which  uses  the  ith  constructor 
variable.  By  the  generic  property  and  the  fact  that  tne 
entire  ith  constructor  variable  is  then  open  to  inspec¬ 
tion  we  have  the  the  next  theorem. 

THEOREM  6:  The  recursion  constructors  of  P'  must  be 
exactly  those  of  P. 

What  remains?  Well  the  order  in  which  the  primary 
positions  are  tested  is  the  only  thing  we  have  not  nailed 
down.  For  each  primary  position  d  add  to  our  test 
data.  We  leave  it  to  the  reader  to  verify: 

THEOREM  7:  The  order  of  predicate  evaluation  in  P-  is 
exactly  that  of  P. 

Counting  the  size  of  our  test  set,  we  see  now  that 
it  contains  no  more  then  3  (  n-t-m )  +2  ( p+u+nu )  points.  Com¬ 
bining  all  the  theorems  proved  in  this  section  we  then 
have  our  main  result,  which  states: 

THEOREM:  Given  a  program  P  in  d,  there  exists  a  set  of  no 
more  then  3 ( n+m ) +2 ( p+u+nu )  elements  such  that  if  P'  ir 
any  program  in  6  which  computes  the  same  results  on  this 
set  as  P  does,  then  P  must  be  equivalent  to  P. 


COROLLARY:  Either 


P  is  correct  or  no  program  in  6 


133 


realizes  the  intended  function. 

5.  AN  EXAMPLE 

The  following  example,  taken  from  [6],  will  be  used 
to  illustrate  some  of  the  ideas  here  presented. 

The  program  is  given  by  [6]  as  follows: 

( REVDBL 

(LAMBDA  C ARG 1 ) 

(COND 

((NULL  ARG 1 )  NIL) 

(T  (APPEND  ( REVDBL  ( CDR  ARG 1 ) ) 

(LIST  (CAR  ARG 1 )  (CAR  ARG 1 ] ) 

We  will  translate  it  into  the  following  form. 

REVDBL ( X  ,  Y )  =  ATOM(X)  ->  Y 

Y  <-  C0NS(CAR(X)  ,CONS(CAR(X) ,Y) ) ) 

X  <-  CDR ( X ) 

Using  the  formula  given  in  the  main  theorem,  we  see 
that  a  test  set  exists  for  this  program  containing  no 
more  then  2C  points.  However,  if  one  follows  the 
arguments  given  in  tnis  paper,  one  finas  that  actually 
tr.e  three  points  given  in  figure  5  suffice.  This  illus¬ 
trates  the  point  that  we  have  actually  been  rather 
liberal  in  our  counting,  and  usually  a  much  smaller  test 
set  can  be  found  then  the  limit  stated  in  our  main 


result . 


[1]  R.A.  DeMillc ,  R.J.  Lipton  and  F.G.  Sayward,  'PROGRAM 
MUTATION:  A  new  Approach  to  Program  Testing', 

presented  at  the  Navy  Laboratory  Computing  Committe 
Symposium  on  Software  Spec  if icationa  and  Testin 
Technology,  April  1978. 

i.2]  T .  A .  Budd ,  R.A.  DeMillo,  R.J.  Lipton  and  F.G. 
Sayward,  "The  Design  of  a  Prototype  Mutation  System 
for  Program  Testing" ,  Proceedings  of  the  1978 
National  Computer  £onference7  "pp  623-627.  ~~ 

[31  R.A.  DeMillo,  R.J. Lipton  and  F.G. Sayward,  "Hints  on 
Test  Data  Selection:  Help  for  the  Practicing  Pro¬ 
grammer"  ,  Computer,  11,4  (April  1978),  pp34-41. 

[4]  T. A. Budd  and  R.J. Lipton,  "Mutation  Analysis  of 

Decision  Table  Programs" ,  Proceedings  of  the  1978 
Conference  on  Information  Sciences~and  Systems" .  —nn7 
346-3^9.  '  - 

[5]  S.  Hardy,  "Synthesis  of  LISP  functions  from  Exam¬ 
ples"  ,  Proceedings  of  the  Fourth  International  Joint 
Conference  on  Artificial  Intelligence. 

[6]  D.E.Shaw,  W. K.Swartout ,  and  C.C. Green,  "Inferring 

LISP  programs  from  Examples",  Proceedings  of  the 
Fourth  Interna ion al  Joint  Conference  on'  Artificial 
Intelligence .  ~  ’  '  “ 

[7]  P.D. Summers,  "program  Construction  from  Examples", 
Ph.D.  Thesis,  Department  of  Computer  Science,  Yale 
University,  New  Haven,  Ct.,  1975. 


on  IP 


137 


The  design  of  a  prototype  mutation  system 
for  program  testing* 

by  TIMOTHY  A.  BUDD,  RICHARD  J.  UPTON  and  FREDERICK  G.  SAYWARD 

Yale  University 

New  Haven.  Connecticut 

and 

RICHARD  A.  DeMILLO 

lieorgtu  institute  of  Technology 
Atlanta.  Georgia 


INTRODUCTION 

When  testing  software  the  major  question  which  must  al¬ 
ways  be  addressed  is  "If  a  program  is  correct  for  a  finite 
number  of  test  cases,  can  we  assume  it  is  correct  in  gen¬ 
eral."  Test  data  which  possess  this  property  is  called  Ade¬ 
quate  test  data.  and.  although  adequate  test  data  cannot  in 
general  be  derived  algorithmically.'  several  methods  have 
recently  emerged  which  allow  one  to  gain  confidence  in 
one's  test  data's  adequacy. 

Program  mutation  is  a  radically  new  approach  to  deter¬ 
mining  test  data  adequacy  which  holds  promise  of  being  a 
major  breakthrough  in  the  field  of  software  testing.  The 
concepts  and  philosophy  of  program  mutation  have  been 
given  elsewhere.’  the  following  will  merely  present  a  brief 
introduction  to  the  ideas  underlying  the  system. 

Unlike  previous  work,  program  mutation  assumes  that 
competent  programmers  will  produce  programs  which,  if 
they  are  not  correct,  are  "almost"  correct.  That  is,  if  a 
program  is  not  correct  it  is  a  "mutant" — it  differs  from  a 
correct  program  by  simple  errors.  Assuming  this  natural 
premise,  a  program  P  which  is  correct  on  test  data  T  is 
subjected  to  a  series  of  mutant  operators  to  produce  mutant 
programs  which  differ  from  P  in  very  simple  ways.  The 
mutants  are  then  executed  on  T.  If  all  mutants  give  incorrect 
results  then  it  is  very  likely  that  P  is  correct  (i.e.,  T  is 
adequate).  On  the  other  hand,  if  some  mutants  are  correct 
on  T  then  either:  ( I)  the  mutants  are  equivalent  to  P,  or  (2) 
the  test  data  T  is  inadequate.  In  the  latter  case.  T  must  be 
augmented  by  examining  the  non-equivalent  mutants  which 
are  correct  on  T:  a  procedure  which  forces  close  examina¬ 
tion  of  P  with  respect  to  the  mutants. 

At  first  glance  it  would  appear  that  if  T  is  determined 
adequate  by  mutation  analysis,  then  P  might  still  contain 
some  complex  errors  which  are  not  explicitly  mutants  of  P. 


’  This  research  ni  supported  ui  part  by  NSFtinnl  MCS76-4I4S6  and  U  S 
Army  Research  Gram  DAAG-2W  76-0-033* 


To  this  end  there  is  a  COUPLING  EFFECT  which  states 
that  test  data  on  which  all  simple  mutants  fail  is  so  sensitive 
that  it  is  highly  likely  that  all  complex  mutants  must  also 
fail. 

Readers  wishing  a  further  exposition  of  the  ideas  of  mu¬ 
tation  and  substantiation  of  the  assumptions  made  are  re¬ 
ferred  to  References  2  and  10. 


THE  SYSTEM 

A  pilot  system  has  been  built  to  implement  mutation  anal¬ 
ysis  on  programs  written  in  a  subset  of  FORTRAN .  The  key 
features  of  this  system  are  summarized  in  Figure  I.  The 
system  itself  consists  of  10,000  lines  of  FORTRAN  code, 
and  required  six  man  months  to  design,  implement  and 
debug. 

Notice  we  claim  the  system  is  man/machine  interactive 
In  general  an  attempt  is  made  to  assign  tasks  to  both  the 
user  and  the  machine  processors  which  are  best  suited  to 
using  their  particular  capabilities.  One  way  to  see  this  is  to 
view  the  system  as  a  sort  of  "Devils  Advocate",  which 
when  confronted  with  a  program  asks  very  difficult  ques¬ 
tions  about  the  motivation  behind  it  ("why  did  you  use  this 
type  of  statement  here,  when  an  alternative  statement  works 
just  as  well?").  The  job  of  the  human  is  then  to  provide 
justification  (in  the  form  of  test  data),  which  will  give  an 
answer  to  such  questions. 

An  overview  of  the  structure  of  the  system  is  given  in 
Figure  2.  We  point  out  that  the  language  FORTRAN  was 
chosen  for  the  first  implementation  merely  as  a  matter  of 
convenience  since  it  is  in  common  use  and  there  is  a  large 
body  of  software  in  existence  to  experiment  on  The  heart 
of  the  system  (roughly  that  shown  within  the  dotted  boxi  is 
however,  language  independent,  and  given  a  sufficiently 
general  internal  form  to  implement  a  new  language  one 
would  merely  write  a  new  input/output  interface.  Projects 
are  currently  under  way  to  implement  mutation  analysis  on 


j 


138 


National  Computer  Conference,  1978 


INTERACTIVE 
MACHINE  INDEPENDENT 
LANGUAGE  INDEPENDENT  STRUCTURE 
MODULAR  DESIGN 

INTENSIVE  MAN/MACHINE  INTERACTION 
Figure  1 — Key  features  of  the  pilot  mutation  system 


COBOL  and  C  (an  ALGOL  like  language)  using  the  struc¬ 
ture  represented  by  the  box  contained  in  the  dotted  lines. 

An  attempt  was  also  made  to  keep  the  structure  of  the 
system  largely  machine  independent.  The  system  was  orig¬ 
inally  programmed  to  run  on  a  PDP-10  at  Yale  University. 
Currently  we  are  in  the  process  of  transferring  it  to  a  CDC 
7600  at  the  Georgia  Institute  of  Technology. 

A  single  run  of  a  mutation  system  divides  naturally  into 
three  phases  the  RUN  PREPARATION  phase,  in  which  the 
necessary  variables  to  send  to  the  mutation  executor  are 
defined,  the  MUTATION  phase,  in  which  the  actual  muta¬ 
tions  are  produced  and  executed,  and  the  POST  RUN  phase, 
in  which  results  are  analyzed  and  reports  are  generated.  In 
the  following  we  will  describe  in  more  detail  the  structure 
and  effects  of  each  phase. 

The  role  of  the  run  preparation  phase  is  to  initialize  the 
various  files  and  data  buffer  areas  used  by  the  mutation 
executor.  It  is  characterized  by  a  very  interactive  nature. 
The  first  object  the  user  is  requested  to  supply  is  the  name 
of  the  file  on  which  the  FORTRAN  subroutine  resides.  Then 
depending  on  whether  PI  MS  has  been  run  previously  on  this 
routine  (in  which  case  the  internal  form  is  stored  on  one  of 
the  many  files  PIMS  constructs,  see  below)  the  subroutine 
is  parsed  into  a  concise  internal  format  which  is  subse¬ 
quently  interpreted  to  simulate  execution  of  the  program.  A 


r 


Fifure  2 


IF  (A  .  LT .  X  ( 2 )  )  !'  =  1 

(SCALAR.  A] 

( ARRAY 1 .  X] 

(CONSTANT . 2 ] 

[AOP. SUBSCRIPT] 

( ROP  .  L.T  ] 

[TRF.O] 

[SCALAR,  p] 

[CONSTANT.  1  ] 

[ASSIGN. OJ 

f-|gUTC  1 


fragment  of  the  internal  code  generated  for  a  given  statemeni 
is  shown  in  Figure  3. 

The  user  is  then  interactively  prompted  for  the  test  data 
on  which  the  program  and  mutants  are  to  be  tested.  After 
each  test  case  has  been  specified  the  original  program  is 
executed  on  the  test  case  and  the  results  displayed  so  that 
the  user  may  satisfy  himself  that  the  results  produced  are 
indeed  correct. 

After  the  test  data  has  been  entered  the  user  is  prompted 
for  a  listing  of  which  mutant  operators  he  wishes  to  enable 
At  present  there  are  25  mutant  operators.  These  range  from 
very  simple  low  level  ones,  such  as  replacing  each  data 
occurrence  (where  a  data  occurrence  is  a  scalar,  constant 
or  array  reference)  with  all  other  syntactically  correct  data 
occurrences,  to  very  high  level  mutations,  such  as  deleting 
statements  or  altering  the  control  structure  of  the  program 
A  more  detailed  description  of  the  mutations  performed  can 
be  found  in  Reference  3. 

Instead  of  constructing  multiple  copies  of  the  program, 
for  each  mutant  a  short  (four  word)  description  of  the  mu¬ 
tation  to  be  performed  is  kept.  Each  time  the  mutant  is  to 
be  run  the  original  program  is  then  mutated  according  to  the 
contents  of  this  descriptor. 

After  the  user  has  specified  to  the  system  his  program, 
test  data  and  the  mutant  operators  he  wishes  applied,  the 
system  then  enters  the  MUTATION  phase.  During  this 
phase  there  is  no  user  interaction.  Mutation  descriptor  rec¬ 
ords  are  read  in,  one  by  one,  and  the  mutation  is  produced 
The  mutant  program  is  then  executed  on  the  test  dau-  and 
marked  either  "dead,"  meaning  it  produced  resuh  •  ..uTering 
from  the  original  program  on  at  least  one  test  case,  or  "liv¬ 
ing.”  A  dynamic  record  is  kept  of  the  number  and  percent 
age  of  living  mutants  of  each  mutation  type. 

When  all  the  mutant  programs  have  been  tested  the  post 
run  phase  is  entered.  In  this  phase  statistics  are  displayed 
indicating  the  results  of  the  mutation  run.  In  addition  the 
user  can  interactively  view  descriptions  of  those  mutations 
which  have  survived.  He  can  also  specify  that  certain  re 


1 

139  I 


The  Design  of  a  Prototype  Mutation  System 


RUN 

PREPARATION 

PHASE 


MUTATION  POST  RUN 

PHASE  PHASE 


Figure  4 


ports  be  generated  in  order  to  provide  a  detailed  permanent 
record  of  the  mutation  run. 

At  this  point,  or  at  a  later  date,  the  user  can  re-run  the 
system  and  augment  his  test  data  in  an  attempt  to  make  the 
remaining  mutants  fail.  He  may  also  specify  that  additional 
mutant  operators  be  applied  to  the  program.  This  cycle  can 
continue  until  the  user  is  satisfied  that  the  current  test  data 
adequately  tests  his  program. 

There  are  several  files  the  system  produces  in  order  to 
store  information  from  one  run  to  the  next.  ’hese  are  shown 
in  figure  4,  which  outlines  the  major  functions  of  each 
phase.  The  internal  form  file  stores  the  parsed  version  of  the 
program.  The  test  data  file  stores  for  each  test  case  the  test 
data  input  and  the  results  of  execution  of  that  test  data.  The 
mutants  information  fde  keeps  the  mutant  descriptor  records 
plus  various  other  counts  on  what  types  of  mutants  have 
been  produced. 


A  COMPARISON  OF  P1MS  TO  OTHER  DATA 
TESTING  SYSTEMS 

Various  systems  have  been  discussed  in  the  literature  for 
increasing  confidence  in  the  adequacy  of  test  data,  as  the 
PIMS  system  does,  or  automatically  constructing  test  data 


which  meets  some  criterion.  In  this  section  we  will  report 
on  experiments  which  show  that  the  PIMS  system  is  an 
improvement  in  this  area  over  other  systems  which  have 
been  proposed. 

The  most  widely  known  method  of  constructing  test  data 
automatically  are  those  systems  which  utilize  path  analy¬ 
sis.  4-7  Essentially,  these  procedures  attempt  to  construct 
data  which  force  each  ,  tement  to  be  executed  at  least 
once,  and  furthermore  which  transverse  each  feasible  flow 
path  through  the  code  at  least  once.  In  some  cases,  such  as 
loops,  only  an  approximation  to  this  can  be  made  as  the 
number  of  now  paths  may  be  infinite.  Here  it  is  usual  to  just 
construct  data  which  cause  the  loop  to  be  executed  at  least 
twice. 

These  same  objectives  are  met  with  mutant  analysis  in  a 
number  of  ways,  some  directly  by  mutant  operators,  others 
indirectly  by  the  coupling  effect.  There  are  mutants  which 
cause  each  statement  in  the  original  program  to  be  replaced 
by  a  TRAP  statement,  a  special  type  of  statement  which  if 
ever  executed  causes  the  program  to  immediately  abort 
Obviously,  then  if  there  is  some  statement  in  the  program 
which  is  never  executed,  changing  that  statement  to  a  TRAP 
statement  will  not  alter  the  output  of  the  program  and  hence 
will  easily  be  detected. 

Checking  that  every  decision  path  is  taken  is  essentially 


r 


140 


National  Computer  Conference,  1978 


DO  10  1  =  1.  J 


10  CONTINUE 
DO  10  1=1.1 


10  CONTINUE 

A  LIVE  MUTATION  IF  THE  LOOP  IS 
ALWAYS  EXECUTED  ONLY  ONCE. 

Figure  5 

the  same  as  checking  that  every  predicate  in  the  program 
evaJuates'at  least  once  to  both  true  and  faJse.  If  this  is  not 
the  case,  say  the  predicate  always  evaluated  to  TRUE,  then 
we  can  mutate  the  predicate  in  any  way  we  desire  as  long 
as  it  retains  this  property  of  always  remaining  TRUE.  These 
types  of  mutations  arc  also  usually  quite  obvious  and  easily 
detectable. 

Mutation  analysis  can  also  insure  that  each  loop  is  trav¬ 
ersed  at  least  twice.  The  only  way  a  loop  can  be  traversed 
only  once  (and  all  loops  must  be  traversed  at  least  once  to 
pass  the  TRAP  statement  mutations)  is  if  the  terminating 
condition  is  the  same  as  the  starting  condition  But  in  this 
case  the  mutant  which  replaces  the  terminating  condition  by 
the  starting  condition  will  survive  (sec  Figure  5).  This  is 
once  more  easily  detected. 

With  this,  mutant  analysis  possesses  all  the  capabilities  of 

SUBROUTINE  BSERCH (X . Y . N . A . IHICH . LOW . ERR ) 

INTEGER  X(N).Y(N) -N.A. IHICH. LOW. ERR .KID 
C  BINARY  SEARCH  PROCEDURE.  IF  X  CON' A  INS  A  ON  RETURN 

C  X(IHIGH)  -  A.IHIGH-LOW.  IF  NOT  X(LOW)  «  A  *  X(IHICH). 

t  IF  A  IS  OUT  OF  RANGE  ON  RETURN  ERR  CONTAINS  1 

ERR  -  0 

IF  ((X(l)-A).CT.O)  GOTO  11 
IF  ( (A-X(N) . LE . 0)  GOTO  5 

11  ERR  -  1 
RETURN 

5  LOW  -  1 
IH1GH  -  N 

6  IF  ( (IHIGH-LOW-1 ) .NE.O)  COTO  7 
RETURN 

7  MID  -  (L0VMH1GH)  /  2 

IF  ( (A-X(MID) ) .CT.0)  COTO  10 
IHICH  •  MID 

GOTO  6 

10  LOW  •  MID 
GOTO  6 
END 

Figure  6 


Rep  1  ace 

IF 

((IHICH-LUW-D.NL.OJ  t.Olti 

by 

IF 

(  ( IHICH-LDW- 1 )  .01  .0)  GOTO  7 

Replace 

MID 

•  (lou+ihigh)  ; 

by 

MID 

-  1 LOVH  IHICH)-.' 

Figure  7 


path  analysis  systems  which  have  been  discussed  in  the 
literature. 

Another  class  of  systems  for  which  extensive  claims  have 
been  made  are  those  which  detect  uninitialized  variables  and 
dead  code Uninitialized  variables  are  caught  as  a  conse¬ 
quence  of  the  interpretation  process  in  the  mutant  system 
Dead  code  is  easily  caught  since  an  assignment  made  to  a 
dead  variable  can  be  mutated  in  any  way  whatsoever  and 
the  program  will  remain  the  same. 

A  third  class  of  systems  for  which  there  has  recently  been 
much  discussion  involves  symbolic  execution  of  the  pr  ogram 
In  one  study’  Howden  analyzed  12  programs  containing  a 
total  of  22  errors.  He  found  that  symbolic  execution  would 
catch  13  of  those  errors,  while  path  analysis  would  discover 
only  nine.  In  a  similar  study  we  estimated  that  mutation 
analysis,  using  only  the  mutant  operators  in  the  present 
P1MS  system,  would  uncover  18  of  the  22  errors.  Of  the 
remaining  four,  three  would  probably  be  discovered  if  we 
added  two  new  mutant  operators  which  the  authors  simple 
had  not  thought  about.  Hence,  mutation  analysis  is  in  certain 
cases  an  improvement  over  symbolic  execution. 

As  an  example  of  the  very  subtle  errors  which  muiation 
analysis  can  discover  consider  the  program  to  perform  bi¬ 
nary  search  shown  in  Figure  6.  If  it  happens  that  N=  I  when 
the  subroutine  is  called  (i.e.,  the  vector  to  be  searched 
contains  only  a  single  element)  then  it  is  not  difficult  to  see 
that  the  program  will  loop  indefinitely.  It  is  not  clear  that 
either  symbolic  execution  or  path  testing  would  be  sufficient 
to  discover  this  error. 

When  mutant  analysis  is  applied  to  this  program  there  are 
two  mutants  generated  (shown  in  Figure  7)  which  can  only 
be  eliminated  by  a  test  case  consisting  of  one  element 
Hence  the  error  is  easily  detected  using  mutant  analysis 
(There  is  a  second  error  in  this  program  which  is  also  un 
covered  by  mutant  analysis.  The  discovery  of  that  second 
error  is  left  to  the  reader). 

FUTURE  WORK 

There  are  several  directions  in  which  work  is  currently 
being  pursued  with  respect  to  mutation  analysis  and  the  pilot 
mutation  system.  The  most  obvious  is  to  show  h<-  a  similar 
system  might  be  built  around  another  language,  and  research 
is  under  way  to  construct  systems  for  COBOL  and  for  C. 

Another  area  of  study  is  the  design  of  an  easy  to  use 
language  for  the  description  of  test  cases  which  allows  for 
a  variety  of  features.  Test  datasets  can  often  be  quite 
lengthy,  yet  two  test  cases  can  be  very  similar.  Also,  a  user 
often  wishes  just  to  construct  a  number  of  random  test  cases 
following  some  specification.  (Some  of  the  pitfalls  of  using 
random  data  to  test  programs  are  discussed  in  Reference  10 


141 


The  Design  of  a  Prototype  Mutation  System 


where  it  is  seen  that  mutation  analysis  can  help  in  deriving 
"good"  random  test  data.)  Finding  an  easy  yet  powerful 
method  of  solving  this  problem  is  the  goal  of  one  area  of 
research. 

Finding  a  method  to  detect  equivalent  mutants  is  another 
area  currently  being  pursued.  It  is  often  the  case  that  a 
mutation  will  not  produce  a  significantly  different  program 
(replacing  the  sequence  1  =  1  J=  1  with  the  sequence  1=  1  .1  =  1 
is  a  trivial  example).  We  have  observed  that  programs  tested 
have  between  one  and  two  percent  equivalent  mutants.  A 
method  to  automatically  detect  and  remove  equivalent  mu¬ 
tants  would  allow  us  to  provide  even  more  significant  meas¬ 
ures  of  the  adeouacy  of  a  test  data  set. 

We  point  out  that  as  a  consequence  of  the  modular  design 
of  the  pilot  system  either  of  the  above  two  major  extensions 
can  be  added  without  a  significant  reprogramming  effort. 

A  final  area  of  current  interest  is  the  study  of  mutant 
operators.  Certain  operators  seem  to  have  a  much  greater 
ability  to  detect  errors  then  others.  Analysis  of  data  along 
these  lines  would  allow  us  to  discover  an  order  of  application 
of  mutant  operators  which  would  maximize  the  cost/benefit 
ratio. 

CONCLUSIONS 

It  has  been  shown  that  the  ideas  of  program  mutation  can 
be  quickly  and  easily  implemented  as  an  interactive  system 
for  program  testing.  The  resulting  system  represents  a  cost 
effective  engineering  approach  to  testing  real  world  soft¬ 


ware.  Large  subroutines  (over  a  hundred  statements  longi 
have  been  analyzed  by  our  system  with  relative  ease. 

Mutation  is  a  method  of  program  testing  which  will  sig¬ 
nificantly  raise  the  level  of  reliability  in  both  new  and  exist¬ 
ing  software,  and  is  a  major  advance  in  the  area  of  software 
testing. 


REFERENCES 

I  Goodenoug  h,  J  B  sad  S.  L.  Gerhart,  "Toward!  a  Theory  of  Teal  Data 
Selection."  /£££  Tran  Soft.  Eng  SE-1,2.  June  1975.  pp  156-173 

2.  DeMiUo,  R  ,  R.  J  Upton  and  F  Sayward.  "PROGRAM  MUTATION— 
A  Method  of  Determining  Ten  Data  Adequacy."  in  preparation 

3.  Budd.  T.  and  F.  Sayward,  "LI ten  guide  to  the  Pilot  Mutation  System." 
Yale  University  Tech.  Rep  114.  1977 

4.  Ramamoorthy.  C.  V..  S.  F.  Ho.  and  W.  T.  Chen.  "On  the  Automated 
Generation  of  Program  Teal  Data,"  IEEE  Trans,  on  Soft  Eng..  SE-2.4, 
Dec  1976,  pp.  293-300 

5.  Howden,  W.  E.,  "Methodology  for  the  Generation  of  Propam  Ten 
Data."  IEEE  Trans  on  Comp..  C-24.5,  May  197).  pp.  554-560 

6.  Huang,  J.  C  .  "An  Approach  to  Program  Teating."  Comparing  Survey*. 
7.3.  Sept.  1975.  pp  113-128 

7.  Miller.  E.  F.  and  R.  A.  Melton.  "Automated  Generation  of  Teatcaae 
Dataacta,"  Proc.  lat  Int.  Conf.  on  Reliable  Software.  SIGPIAN  Nonets 
10.6.  June  1975.  pp.  51-38 

8.  Oaterweil.  L.  J  arid  L  D.  Fosdick.  "Some  Experience  with  DAVE— A 
Fortran  Program  Analyzer."  A  FIPS  Conftrtnct  Proceedings.  Vol  45 
1976,  pp.  909-915 

9  Howden.  W.  E..  "Symbolic  Teating  and  the  DISSECT  Symbolic  Eval 
uauon  System."  IEEE  Trans  on  Soft  Eng  .  SE-3.4.  pp  266-278 
10.  DeMillo.  R  ,  J.  Upton  and  F.  Sayward.  "Hints  on  Teat  Data  Selection.' 
to  appear  in  Comparer.  April  1978 


142 


HEURISTICS  FOR  DETERMINING  EQUIVALENCE  OF  PROGRAM  MUTATIONS 


Douglas  Baldwin 
and 

Frederick  Sayward 

Department  of  Computer  Science 
Yale  University 
New  Haven,  Connecticut  06520 


ABSTRACT 


A  mutant  of  a  program  P  is  a  program  M  which  is  derived  from  P  by  making 
some  well-defined  simple  change  in  P.  Some  initial  investigations  in  the 
area  of  automatically  detecting  equivalent  mutants  of  a  program  are 
presented.  The  idea  is  based  on  the  observation  that  compiler 
optimization  can  be  considered  a  process  of  altering  a  program  to  an 
equivalent  but  more  efficient  mutant  of  the  program.  Thus,  the  inverse 
of  compiler  optimization  techniques  can  be  seen  as,  in  essence, 
equivalent  mutatuion  detectors. 


1.0  INTRODUCTION 


A  mutant  of  a  program  P  is  defined  as  a  program  P'  derived  from  P  by 
making  one  of  a  set  of  carefully  defined  syntactic  changes  in  P.  Typical 
changes  include  replacing  one  arithmetic  operator  by  another,  one 
statement  by  another,  and  so  forth.  Program  mutation  has  been  used  by 
DeMillo,  Lipton  and  Sayward  as  the  basis  for  an  interactive  program 
testing  system  (2].  The  theory  behind  this  system  is  that  a  set  of  test 
data  T  adequately  tests  a  program  P  if  all  mutants  of  P  are  distinguished 
from  P  by  either  failing  to  produce  any  result  or  producing  a  different 
result  for  some  element  of  T.  On  the  other  hand,  if  a  mutant  performs 
identically  to  P  then  either  T  does  not  fully  test  the  program  and 
further  cases  must  be  developed,  or  the  mutant  is  equivalent  to  P. 
Obviously  it  is  impossible  to  develop  test  data  that  distinguish  between 


1 43 


equivalent  forms  of  the  same  program,  and  thus  it  is  desirable  that 
equivalent  mutants  be  excluded  from  the  testing  process.  Unfortunately, 
user  recognition  of  equivalent  mutants  has  proven  to  be  a  difficult  and 
tedious  task.  Thus  it  is  important  that  the  system  aid  the  user  by 
either  automatically  detecting  equivalent  mutants  or  by  posing  questions 
which  provide  insights  on  how  to  do  so. 

Our  goal  is  to  develop  heuristics  by  which  equivalent  mutants  can  be 
recognized.  The  heuristics  are  primarily  derived  from  techniques  used  to 
optimize  compiler  code,  since  the  process  of  optimizing  compiler  code  can 
be  thought  of  as  producing  a  series  of  mutants  which  are  equivalent  to 
the  original  program.  It  is  thus  expected  that  some  of  the  tests 
developed  to  determine  when  an  optimization  is  equivalence  preserving  can 
be  applied  to  determine  when  a  mutation  is  equivalence  preserving. 

Once  a  body  of  heuristics  has  been  developed  to  detect  equivalence 
of  mutants  it  will  be  possible  to  develop  a  program  to  actually  recognize 
them  in  a  program  testing  system.  This  system  will  probably  be  very 
similar  to  the  optimization  phase  of  a  compiler.  It  will  generate  some 
representation  of  each  mutant  which  can  be  easily  manipulated  and  apply 
the  heuristics  described  below  to  determine  if  it  is  equivalent  to  the 
original.  If  so  then  the  mutant  will  be  flagged  as  equivalent  and  will 
be  excluded  from  future  testing  runs. 

2.0  PROGRAM  MUTATION 

«s  defined  above  a  mutant  of  a  program  is  a  second  program  derived 
from  the  first  through  carefully  defined  syntactic  transformations, 
rrogram  mutation  is  the  process  of  forming  mutants  from  an  input  program. 


144 


The  work  described  here  is  Intended  to  find  way9  of  determining 
equivalence  of  mutants  derived  as  part  of  a  process  for  testing  FORTRAN 
programs  on  the  EXPER  [4]  testing  system.  The  mutations  made  by  EXPER 
are  chosen  so  as  to  duplicate  as  closely  as  possible  the  mistakes  which  a 
good  programmer  might  make  in  coding  a  FORTRAN  program.  Thus  many  of  the 
mutants  involved,  such  as  DO-loop  end  replacement,  are  specific  to 
FORTRAN.  The  mutations  of  interest  are  described  below: 


1.  Constant  Replacement:  Replacement  of  a  constant,  C,  with  C+l  or  C-l. 
t.x:  A=1  becomes  A=0. 

2.  Scalar  Replacement:  Replacement  of  one  scalar  by  another. 

Ex:  A*B  becomes  A**C. 

3.  Scalar  for  Constant  Replacement:  Replacement  of  a  constant  with  some 
scalar  variable 

Ex:  A“2  becomes  A=*B. 

4.  Constant  for  Scalar  Replacement:  Replacement  of  some  scalar  variable 
with  a  constant. 

Ex:  A=*B  becomes  A-2. 


5. 


Source  Constant  Replacement:  Replacement  of  one  constant 
program  with  some  other  constant  found  in  the  program. 

Ex:  A«3  becomes  A=1  where  the  constant  1  appears  in  some 
statement . 


in  the 
other 


6.  Array  Reference  for  Constant  Replacement:  Replacement  of  a  constant 
with  an  array  reference. 

Ex:  A»1  becomes  A-=B(1). 


7.  Array  Reference  for  Scalar  Replacement:  Replacement  of  a  scalar 
reference  with  an  array  reference. 

Ex:  A“B  becomes  A=C(1). 


8.  Comparable  Array  Name  Replacement:  Replacement  of  a  reference  to  one 
array  with  a  reference  to  the  same  element  of  another  array  of  the 
same  size  and  shape. 

Ex:  A-B(l,3)  becomes  A-X(l,3). 

9.  Constant  for  Array  Reference  Replacement:  Replacement  of  an  array 
reference  with  a  constant. 

Ex:  A”B(1)  becomes  A»3. 


10.  Scalar  for  Array  Reference  Replacement:  Replacement  of  an  array 
reference  with  a  refereance  to  a  scalar. 

Ex:  A-B(l)  becomes  A-C. 


145 


11.  Array  Reference  for  Array  Reference  Replacement:  Replacement  of  one 
array  reference  by  another. 

Ex:  A-B(l)  becomes  A«C(2). 

12.  Unary  Operator  Insertion:  Insertion  of  one  of  the  unary  operators 
!  (absolute  value),  -  (negation),  -*-+■  (Increment  by  1)  or 

—  (decrement  by  1)  In  front  of  any  data  reference. 

Ex:  A-B  becomes  A— B. 

13.  Arithmetic  Operator  Replacement:  Replacement  of  one  arithmetic 
operator  (+,-,*,/,**)  with  another. 

Ex:  A-B+C  becomes  A-B-C. 

14.  Relatio  al  Operator  Replacement:  Replacement  of  one  relational 
operator  ( .EQ. , .LE. , .GE. , .LT. , .GT. , .NE. )  with  another. 

Ex:  IF(A.EQ.B)  GOTO  1  becomes  IF(A.NE.B)  GOTO  1. 

15.  Logical  Connt  tor  Replacement:  Replacement  of  one  logical  conrector 
(.AND. , .OR. )  with  the  other. 

Ex:  A.AND.B  becomes  A.OR.B. 

16.  Unary  Operator  Removal:  Deletion  of  any  unary  operator. 

Ex:  A-!B  becomes  A-B. 

17.  Statement  Analysis:  Replacement  of  any  statment  with  a  trap 
statement  whose  execution  causes  immediate  failure  of  the  program. 

Ex:  GOTO  2  becomes  CALL  TRAP. 

18.  Statement  Deletion:  Removal  of  any  statement. 

Ex:  GOTO  2  is  removed,  l.e.  becomes  CONTINUE. 

19.  Return  Statement  Replacement:  Replacement  of  any  statement  by  a 
RETURN  statement. 

Ex:  A-0  becomes  RETURN. 

^0.  Goto  Statement  Replacement:  Replacement  of  any  GOTO  statement  with  a 
GOTO  to  a  different  label. 

Ex:  GOTO  1  becomes  GOTO  3. 

21.  DO  Statement  End  Reilaceraent:  Replacement  of  the  end  label  In  a  DO 
statement  with  some  other  label. 

Ex:  DO  2  1-1,10  becomes  DO  1  1-1,10. 

22.  Data  Statement  Alteration:  Changing  the  values  assigned  by  a  DATA 
statement . 

Ex:  DATA  A  /2 /  becomes  DATA  A  / 1 / . 

23.  Unary  Operator  Replacement:  Replacement  of  one  unary  operator  by 
another. 

Ex:  A-!B  becomes  A— H-B. 


146 

Obviously  some  of  the  mutations  described  above  can  produce  mutants 
which  are  equivalent  to  the  original  program.  For  instance,  replacing 
A-0  with  A-10  does  not  change  a  program.  It  might  be  hoped  that 
detection  of  equivalent  mutants  would  be  easy,  since  the  mutations 
involved  are  so  simple  and  well  defined.  Unfortunately  this  is  not  the 
case.  It  is  easily  shown  that  the  general  problem  of  determining  the 
equivalence  of  two  primitive  recursive  functions  is  undecidable  [1).  If 
we  let  PI  and  P2  be  FORTRAN  routines  corresponding  to  two  arbitrary 
primitive  recursive  functions  we  can  show  that  the  equivalence  of  mutants 
is  undecidable.  Consider  the  following  program  to  which  the  mutation 
"GOTO  Statement  Replacement"  has  been  applied: 

GOTO  1 

1  PI 
STOP 

2  P2 
STOP 

The  resulting  mutant  looks  like: 

GOTO  2 

1  Pi 
STOP 

2  P2 
STOP 

Plainly  these  programs  are  equivalent  if  and  only  if  PI  and  P2  are 
equivalent.  Since  the  equivalence  of  Pi  and  P2  is  undecidable,  the 
equivalence  of  the  mutant  and  original  programs  must  also  be  undecidable. 

The  easiest  way  to  show  that  two  programs  are  not  equivalent  is  to 
find  some  Input  on  which  they  produce  different  outputs.  This  is  the 
basic  function  of  EXPER  as  a  program  testing  tool,  and  thus  many  mutants 
do  not  need  to  be  tested  for  equivalence.  At  any  given  stage  those 
mutants  which  produce  the  same  output  as  the  original  program  on  all  test 


147 


data  are  called  live  mutants.  Obviously  it  is  only  the  live  mutants  to 
which  sophisticated  equivalence  tests  must  be  applied  at  all.  Since  the 
equivalence  problem  for  program  mutants  is  undecldable,  any  equivalence 
testing  process  will  not  always  be  able  to  detect  all  equivalent  mutants. 
Thus  the  final  decision  about  whether  a  mutant  is  equivalent  to  the 
original  program  might  have  to  be  left  to  the  user.  The  goal  of  the 
testing  process  should  be  to  make  one  of  three  decisions  about  any 
mutant : 


1.  It  is  definitely  equivalent  to  the  original  program. 

2.  It  might  be  equivalent  to  the  original  program,  but  the  information 
needed  to  make  this  determination  is  not  completely  available.  The 
system  should  identify  the  needed  information  and  ask  the  user  to 
supply  it. 

3.  None  of  the  known  tests  are  able  to  determine  whether  the  mutant  and 
the  original  are  equivalent.  The  system  is  unable  to  help  the  user 
at  all. 


3.0  OPTIMIZATION  TECHNIQUES 

Almost  all  of  the  techniques  used  in  optimizing  compiler  code  can  be 
applied  in  some  way  to  decide  whether  a  mutant  is  equivalent  to  the 
original  program.  Some  are  useful  only  in  very  limited  sets  of 
situations,  whereas  others  can  be  applied  to  many  types  of  mutation.  All 
the  techniques  discussed  below  can  be  applied  widely  enough  that  it  would 
be  worthwhile  to  Implement  them  in  an  actual  equivalence  tester. 

The  easiest  way  to  implement  these  techniques  is  in  conjunction  with 
a  flow  graph  of  the  program  being  mutated.  A  flow  graph  is  a  directed 
graph  in  which  each  node  represents  a  statement  or  group  of  statements 
through  which  program  control  flows  linearly  (basic  blocks).  Thus  any 


148 


node  In  the  flow  graph  represents  a  fragment  of  code  which  Is  entered 
only  at  the  first  statement  of  the  block  and  exited  only  from  the  last. 
Furthermore  there  are  no  loops  or  branches  within  the  node.  The  edges  of 
the  flow  graph  represent  branches  within  the  program  from  one  basic  block 
to  another.  Efficient  algorithms  exist  for  generating  flow  graphs  from 
programs,  for  instance  the  process  outlined  by  Schaefer  ((51,  pages 
12-20).  Thus  it  is  reasonable  to  expect  such  a  representation  to  be 
available  to  the  equivalence  tester.  Furthermore,  since  mutants  are  so 
similar  to  the  program  from  which  they  are  derived,  it  will  be  easy  to 
derive  the  flow  graph  of  the  mutant  directly  from  the  flow  graph  of  the 
original  in  most  cases.  In  the  discussion  below  it  is  assumed  that  the 
equivalence  tester  can  examine  programs  at  the  statement  and  token  level; 
whether  these  entitles  are  individual  nodes  in  the  flow  graph  or  packed 
many  per  node  is  irrelevant. 

The  various  optimization  techniques  which  seem  applicable  to  testing 
mutant  equivalence  are  listed  below. 


3.1  Constant  Propagat 


Constant  propagation  involves  replacing  expressions  involving 
constants  with  other  constants  to  eliminate  run-time  evaluation. 

Generally  the  compiler  keeps  track  as  far  as  possible  of  the  value  of 
each  variable  throughout  the  program.  At  any  point  where  an  expression 
involves  only  variables  whose  values  are  known  the  result  of  the 
expression  can  be  computed  at  compile  time  and  placed  in  the  program  as  a 
new  constant.  Thus  this  optimization  applied  to  the  code  fragment 


A-l 

B-2 

C-A+B 


149 


would  produce  the  equivalent  code 

A-l 

B-2 

03 

An  elegant  scheme  for  global  program  analysis  is  given  by  Kildall 
[3].  This  scheme  associates  with  each  statement  of  the  program  a  pool  of 
data  which  are  being  propagated  through  the  program.  Such  data  pools  can 
be  used  for  constant  propagation  by  letting  the  elements  of  the  pool  be 
ordered  pairs  whose  first  element  represents  a  variable  and  whose  second 
element  represents  a  value.  Other  applications  of  this  approach  to 
program  analysis  are  discussed  below.  This  scheme  is  Ideally  suited  to 
the  needs  of  an  equivalence  tester. 


3.2  Invariant  Propagation 


Invariant  propagation  is  similar  to  constant  propagation  in  that  it 
involves  associating  with  each  statement  of  the  program  a  set  of 
invariant  relationships  between  data  elements.  For  instance.  Invariant 
propagation  will  note  such  things  about  a  program  as  "X<0"  or  "B"l”.  As 
Indicated  by  the  last  example  constant  propagation  is  a  special  case  of 
invariant  propagation.  This  technique  is  of  limited  use  in  compilers, 
but  is  very  powerful  for  detecting  equivalent  mutants. 


Invariant  propagation  can  be  implemented  using  Kildall's  scheme  for 
constant  propagation  by  replacing  the  variable  and  value  pairs  with 
triples  of  the  form  <object>,  <relation>,  <object>.  Each  <obJect> 
represents  either  a  variable  or  constant,  and  <relatlon>  is  one  of  the 
algebraic  relations  <,  >,  ",  <,,  >_,  or  <>.  The  only  difficulty  is  that  an 
invariant  propagation  algorithm  should  be  able  to  replace  a  strong 


150 


relationship  with  a  weaker  one  (i.e.  replace  "A-l"  with  "A>1").  The 
propagation  algorithm  should  also  be  able  to  apply  transitivity  to  deduce 
relationships  such  as  "A<0"  from  the  relationships  "A<B"  and  "B<0". 

3.3  Common  Subexpression  Elimination 

One  of  the  optimizations  frequently  performed  by  compilers  is  to 
recognize  subexpressions  which  occur  many  times  but  only  need  to  be 
evaluated  once.  For  instance,  in  the  code  fragment 

A-X+Y 

B-X+Y+Z 

The  expression  "X+Y"  is  evaluated  two  times.  The  common  subexpression 
can  be  eliminated  by  evaluating  it  once  and  assigning  the  result  to  a 
temporary  variable  T,  yielding: 

T-X+Y 

A-T 

B-T+Z 

Kildall  (3]  demonstrates  how  his  scheme  for  global  analysis  can  be 
applied  to  common  subexpression  elimination.  In  this  application  the 
data  pools  are  sets  of  expressions  which  are  partitioned  into  equivalence 
classes  such  that  all  expressions  in  equivalence  class  E  have  the  same 
value.  Thus  the  example  above  might  have  sets  as  shown  below,  where  "I" 
divides  equivalence  classes:  (Note  the  addition  of  a  CONTINUE  statement 
to  show  the  set  after  the  assignment  to  B.) 

A-X+Y  {  } 

B-X+Y+Z  {A, X+Y} 

CONTINUE  {A, X+Y  |  B,X+Y+Z,A+Z) 


Note  that  the  algorithm  described  by  Kildall  generates  equivalent 


I  b  I 

expressions  which  are  not  used  In  the  program,  such  as  A+Z  in  the  same 
partition  as  X+Y+Z  above.  This  feature  allows  the  widest  possible  range 
of  equivalent  expressions  to  be  recognized. 

3. 4  Recognition  of  Loop  Invariants 

A  common  optimizing  technique  removes  code  from  inside  loops  if  the 

execution  of  that  code  does  not  depend  on  the  iteration  of  the  loop. 

Thus  a  loop  of  the  form 

DO  1  1-1,  10 
A(I)-0 
1  B-0 

would  be  replaced  by 

DO  1  1-1 ,  10 
1  A(I)-0 

B-0 

Since  many  of  EXPER  s  mutations  change  the  boundaries  of  loops, 
techniques  for  recognizing  when  code  can  be  removed  from  a  loop  can  be 
useful  in  detecting  equivalences.  Conditions  for  detecting  operations 
which  can  be  removed  from  loops  are  given  by  Schaefer  ([5),  pages 
122-134). 

3.5  Hoisting  and  Sinking 

Hoisting  and  sinking  are  related  to  removal  of  code  from  loops  in 

that  they  involve  moving  code  which  would  be  repeated  several  times  tc  a 

place  where  it  will  only  be  executed  once.  Thus  the  code  fragment 

IF(A.EQ.O)  GOTO  1 

C-0 

B-2 

GOTO  2 
1  C-l 

B-2 
etc. 


2 


152 


could  be  replaced  by 
B=2 

IF(A.EQ.O)  COTO  1 
C=0 

GOTO  2 

1  C-l 

2  etc . 

Here  the  assignment  B=2  has  been  hoisted  to  a  position  before  the 
conditionally  executed  part  of  the  program.  Similarly  sinking  involves 
moving  code  to  a  position  after  some  set  of  blocks.  Mathematical  rules 
for  detecting  the  feasibility  of  hoisting  or  sinking  are  given  on  pages 
115-119  of  Schaefer  [51. 

3.6  Dead  Code  Detection 

Dead  code  detection  involves  the  identification  of  sections  of  a 

program  which  will  either  never  be  executed  or  whose  execution  is 

irrelevant.  An  example  of  typical  dead  code  is  the  fragment  below,  in 

which  the  second  assignment  to  A  kills  the  firBt: 

A“B+C 

A-0 

Schaefer  [51  discusses  rules  for  detecting  dead  code  of  this  form  on 
pages  156-161. 

Another  example  of  dead  code  is  the  case  in  which  one  or  more  basic 
blocks  of  a  program  are  not  connected  to  the  rest  of  the  flow  graph. 

Then,  as  long  as  there  is  only  one  entrance  to  the  program  some  section 
is  never  executed  and  can  be  removed  entirely.  This  case  is  not  expected 
to  arise  very  often  in  programs  written  by  humans,  but  mutations  may 
easily  make  a  large  part  of  a  program  inaccessible  from  the  entry  node. 
For  example,  consider  the  following  mutant  of  a  program: 


153 


A*  1 

RETURN 

B«A+2 

etc 

Here  the  Insertion  of  the  RETURN  statement  has  made  everything  between  it 
and  the  next  label  which  is  referenced  in  a  GOTO  inaccesible.  This  type 
of  dead  code  is  easily  detected  by  examining  the  flow  graph  of  the 
program  in  question. 


4.0  APPLICATIONS 

Each  of  the  above  optimization  techniques  can  be  applied  to  detect 
equivalent  mutants  arising  from  one  or  more  of  the  mutations  applied  by 
EXPER.  Each  is  discussed  below. 

4. 1  Constant  Propagation 

Constant  propagation  Is  most  useful  for  detecting  cases  in  which  a 
mutant  is  not  equivalent  to  the  original  program.  Any  mutant  which  could 
affect  the  known  value  of  a  variable  can  be  detected  in  this  fashion. 

The  mutants  most  easily  checked  using  this  scheme  are  those  involving 
replacement  of  one  data  reference  with  another  (Constant  Replacement, 
Scalar  Replacement,  Scalar  for  Constant  Replacement,  Constant  for  Scalar 
Replacement,  Source  Constant  Replacement,  Array  Reference  for  Constant 
Replacement,  Array  Reference  for  Scalar  Replacement,  Array  Name 
Replacement,  Constant  for  Array  Reference  Replacement,  Scalar  for  Array 
Reference  Replacement,  Array  Reference  for  Array  Reference  Replacement, 
and  Data  Statement  Alteration).  Equivalences  which  may  be  detected,  but 
with  lower  probability,  are  those  Involving  changes  to  expressions 
(Arithmetic  Operator  Replacement,  Unary  Operator  Removal,  Unary  Operator 


■  < 


154 


Insertion,  and  Unary  Operator  Replacement).  It  is  possible  that 
equivalences  involving  actual  changes  to  the  program  flow  could  be 
detected,  but  it  should  be  much  easier  to  detect  these  by  comparing  the 
flow  graphs. 

The  mechanism  for  testing  equivalence  of  mutantB  using  constant 
propagation  is  as  follows:  At  all  points  subsequent  to  the  mutation 
compare  the  constant  pools  of  the  original  program  and  the  mutant.  If 
they  differ  it  is  likely  (though  not  certain)  that  the  mutant  is  not 
equivalent  to  the  original  program.  The  following  example  demonstrates 
this  form  of  detection: 

Original  Program  Mutant  Program 

Code  Constants  Code  Constants 

A-l  A-2 

B-A+2  (A, 1 )  B-A+2  (A, 2) 

etc  (A, 1 ) ,( B, 3)  etc  (A,2),(B,4) 

Here  a  mutation  has  replaced  the  assignment  of  1  to  A  with  an  assignment 

of  2.  The  change  in  the  program  is  reflected  in  the  changed  constant 

pools  following  the  mutation.  Unless  the  assignments  to  A  and  B  are  dead 

it  is  reasonable  to  assume  that  the  mutation  is  not  equivalent  to  the 

original,  and  to  try  to  develop  test  data  which  substantiate  this 

assumption. 

A  firm  test  of  non-equivalence  can  be  made  if  one  of  the  output 
variables  appears  in  the  constant  pool  for  a  RETURN  statement.  Then  if 
the  known  value  of  this  variable  differs  between  the  mutant  and  original 
programs  we  know  that  they  are  not  equivalent,  since  they  return 
different  values  on  identical  inputs.  Obviously  this  test  is  valid  only 
if  some  path  exists  from  the  entry  node  of  the  program  being  tested  to 
the  exit  in  question.  This  question  can  be  resolved  through  dead  code 


155 


dotec  t Ion. 

4.2  Invariant  Propagation 

As  shown  above  Invariant  propagation  is  really  a  super-set  of 
constant  propagation,  and  thus  it  can  be  used  to  test  all  the  sorts  of 
mutants  discussed  under  constant  propagation.  However  since  a  great  deal 
more  information  is  carried  by  invariant  relationships  than  by  equality 
to  a  constant,  this  technique  is  far  more  powerful  than  constant 
propagation.  It  is  particularly  useful  for  testing  the  equivalence  of 
mutants  involving  unary  operators  (i.e.  Unary  Operator  Removal,  Unary 
Operator  Insertion,  and  Unary  Operator  Replacement).  In  many  cases  these 
operators  only  affect  an  expression  if  it  has  a  certain  relationship  to 
0.  For  example,  taking  the  absolute  value  of  an  expression  only  changes 
the  program  if  that  expression  evaluates  to  a  value  less  than  zero; 
negating  an  expression  does  not  change  anything  if  that  expression  always 
evaluates  to  0,  and  so  forth.  These  facts  can  be  used  as  shown  in  the 
following  example: 

Original  Program  Mutant  Program 

Code  Invariants  Code  Invariants 

IF(A.LT.O)  GOTO  1  IF(A.LT.O)  GOTO  1 

B-A  AX)  B-!A  A>0 

In  this  case  the  conditional  allows  us  to  determine  an  invariant  (A>0), 

which  in  turn  allows  us  to  determine  that  the  mutant  program  is 

equivalent  to  the  original,  since  taking  the  absolute  value  of  a  positive 

quantity  is  a  no-op. 

The  power  of  invariant  propagation  is  vastly  Increased  if  the 
propagation  and  testing  algorithms  can  take  advantage  of  transitivity  and 
replacement  of  one  condition  by  a  weaker  one.  Both  of  these  features  are 


156 


demonstrated  below. 


Original  Program 
Code  Invariants 

A-0 

CONTINUE  A-0 

1  B-A  A>0 , A<5 

C-1B  A>0,A<5,B-A 

A-A+l  A>0,A<5,B-A 

IF(A.LT.5)  A>0,A<5,B-A 

GOTO  1  A>0,A<5,B-A 

Mutant  Program 
Code  Invariants 

A=0 

CONTINUE  A-0 

1  B-A  A>0,A<5 

C-B  A>0,A<5,B-A,C-B 

A-A+l  A>0,A<5 , B-A, C-B 

IF(A.LT. 5)  A>0,A<5,B-A,C-B 

GOTO  1  A>0,A<5,B-A,C-B 

Note  that  the  algorithm  for  generating  invariant  pools  recognizes  the 
loop  in  this  program  and  is  thus  able  to  determine  an  upper  bound  on  A. 
Obviously  the  invariants  shown  assume  that  no  other  branches  to  label  1 
exist.  The  relation  A-0  is  replaced  with  the  weaker  A>0  when  the 

statement  A-A+l  is  detected  at  the  end  of  the  loop.  Applying 

transitivity  to  the  mutated  pair  C-IB  and  C-B  allows  us  to  decide  that 

the  mutant  is  equivalent  to  the  original  since  B-A  and  A>0. 


There  is  one  important  feature  of  EXPER  which  is  useful  in 
generating  invariant  pools:  EXPER  can  perform  run-time  checks  of  array 
bounds.  Thus  the  following  statements  generate  the  invariant  pool  shown 


Code  Invariants 

DIMENSION  A( 5 ) 

A(J)-0  J>1,J<5 

Because  EXPER  checks  array  bounds  any  program  aborts  If  J  is  less  than  1 
or  greater  than  5  in  the  assignment  to  A(J).  Thus  any  program  or  mutant 
for  which  the  given  invariants  did  not  hold  prior  to  executing  the 


assignment  would  have  failed,  and  thus  would  obviously  not  be  a  correct 


program. 

4 , 3  Common  Subexpression  Elimination 

Kildall's  equivalence  part itions[ 3  ]  provide  an  excellent  way  to 
handle  mutations  in  assignment  statements.  Changing  an  arithmetic 
operator  changes  the  expression  placed  in  the  equivalence  class  of  the 
variable  to  which  the  assignment  was  made.  Similarly,  mutations  which 
change  an  operand  or  destination  in  an  assignment  will  produce  changes  in 
the  equivalence  classes  following  the  assignment.  Thus  comparing 
equivalence  classes  can  show  that  the  mutant  and  original  differ.  As  an 
example,  consider  the  program  and  mutant  shown  below: 

Original  Program 

Code  Equivalence  Classes 

a-r-k: 

etc .  {A , B4C) 

Mutant  Program 

Code  Equivalence  Classes 

A-B-C 

etc.  {A,B-C} 

Comparing  the  two  sets  of  equivalence  classes  shows  that  A  has  a 
different  value  in  the  two  programs.  As  with  constant  propagation,  we 
can  assume  that  the  mutant  is  not  equivalent  to  the  original  program,  and 
that  test  data  should  be  developed  to  verify  this  assumption. 

Common  subexpression  detection  can  also  be  used  to  show  that  a 
mutant  is  equivalent  to  the  original  program.  If  the  mutation  has 
changed  pari  of  nn  expression  K  io  K't  but  E  and  E'  are  In  the  same 
equivalence  class,  then  the  mutant  Is  equivalent  to  the  original  program. 
The  example  below  demonstrates  this  situation: 


158 


Original  Program 

Code  Equivalence  Classes 

A’B+C 

D=B+C  { A , B+C } 

X  (  A+E  )  =0  { A ,  8+C  ,  D } 

Mutant  Program 

Code  Equivalence  Classes 

A=*8+C 

D=B+C  t  A , B+C } 

X(D+F,)-0  ( A ,  B+C  ,  D) 

Since  A  and  D  are  in  the  same  equivalence  class  we  can  conclude  that  the 
mutation  (replacing  A  with  D  in  the  subscript)  did  not  change  the 
program.  Note  that  since  the  equality  of  A  and  D  is  determined  through 
assignment  of  a  common  expression  this  equivalence  would  be  hard  to 
detect  using  a  simpler  heuristic  such  as  invariant  propagation. 

4.4  Recognition  of  Loop  Invariants 

Many  mutations  change  the  size  of  loops.  The  most  obvious  of  these 
is  the  DO-loop  End  Replacement  operator,  although  the  GOTO  Replacement 
operator  can  also  alter  loops.  In  cases  where  a  loop  has  been  changed  to 
include  more  or  less  code  than  in  the  original,  recognition  of  loop 
invariants  can  be  used  to  decide  whether  or  not  the  change  i9 
significant.  Examination  of  the  flow  graphs  should  make  cases  in  which 
loops  have  changed  fairly  easy  to  detect;  thus  it  is  easy  to  decide  when 
to  apply  these  tests.  The  basic  application  simply  involves  deciding 
whether  or  not  the  excess  code  (that  is,  the  code  which  does  not  appear 
in  both  loops)  is  loop  invariant.  If  it  is  then  the  expansion  (or 
contraction)  of  the  loop  has  not  changed  the  outputs  of  the  program.  As 
an  example,  consider  the  following  code: 


159 


Original  Program 
DO  1  1-1,10 
A(I)-0 

1  CONTINUE 

2  B-0 


Mutant  Program 
DO  2  1-1,10 
A(I)-0 

1  CONTINUE 

2  B-0 


The  mutation  above  expanded  the  DO-loop  to  Include  the  asaignment  of  0  to 
B.  Since  this  assignment  is  loop-invariant  it  does  not  matter  whether  it 
is  done  10  times  Inside  the  loop  or  1  time  outside  it.  Thus  the  original 
and  mutant  programs  are  equivalent. 

4. 5  Hoisting  and  Sinking 

These  tests  are  used  in  situations  similar  to  those  in  which  testing 
of  loop-invariants  is  used,  except  that  they  apply  to  cases  in  which  the 
code  skipped  or  included  by  a  branch  is  changed.  Candidates  for  this 
sort  of  change  include  GOTO  Replacement  and  Statement  Deletion.  In  these 
cases  the  mutant  and  original  programs  are  equivalent  if  the  code  added 
to  or  removed  from  a  basic  block  can  be  hoisted  or  sunk  out  of  that 
block.  Consider  the  following  example: 


Original  Program 
IF(A.EQ.O)  GOTO  1 
A-A+l 

2  B-0 
GOTO  3 

1  B-0 

3  etc 


Mutant  Program 
IF(A.EQ.O)  GOTO  2 
A-A+l 

2  B-0 
GOTO  3 

1  B=0 

3  etc 


In  this  case  B  is  set  to  zero  regardless  of  whether  we  do  it  at  line  2  or 
line  I.  A  more  compact  form  Is  produced  by  hoisting  the  assignment  to  B, 
namely 

B-0 

IF(A.EQ.O)  GOTO  3 
A-A+l 
3  etc 


Because  this  hoisting  is  possible  the  mutant  is  equivalent  to  the 


160 


original  program. 

Because  the  code  skipped  by  the  statement  "GOTO  3"  can  be  hoisted 
the  branch  Is  unnecessary.  Thus  the  hoisting  test  will  also  show  that 
the  mutant  derived  by  deleting  this  branch  is  equivalent  to  the  original 
program. 

4 . 6  Dead  Code  Detection 

As  mentioned  above  this  test  is  very  Important  in  guaranteeing  the 
reliability  of  tests  based  on  Invariant  propagation  (Including  constant 
propagation).  It  can  also  be  used  to  test  the  equivalence  of  some 
mutants  in  its  own  right.  The  equivalences  which  are  most  likely  to  be 
detected  by  this  technique  are  those  arising  from  mutations  that  alter 
the  flow  graph  In  some  way.  Such  mutants  include  Statement  Analysis 
(since  this  mutant  replaces  any  statement  with  an  abnormal  exit), 
Statement  Deletion  (If  GOTO  or  RETURN  statements  are  deleted).  Return 
Statement  Replacement,  and  GOTO  Replacement. 

The  best  way  to  use  dead  code  detection  to  test  mutants  of  this  form 
Is  to  examine  the  flow  graphs  of  the  two  programs.  If  any  node  appears 
in  the  mutant  which  is  not  connected  to  the  rest  of  the  graph  it  is 
reasonable  to  expect  that  the  mutant  is  not  equivalent  to  the  original. 
(The  only  exception  being  the  case  in  which  the  disconnected  node 
consists  only  of  dead  assignments.  This  situation  is  discussed  in 
general  below).  An  example  involving  Return  Statement  Replacement  is 


shown  below: 


Original 

Program 

Mutant 

Program 

Code 

Flow  Graph 

Code 

Flow  Graph 

A-l 

1  1 

A-l 

1  1 

B-2 

C-3 

1  I 

1  1 

1  1 

1  1 

RETURN 

C-3 

1  1 

The  RETURN  statement  has  broken  the  original  single  node  Into  2  nodes 
with  no  connection  between  them.  Thus  one  can  conclude  that  since  code 
which  Is  executed  In  the  original  program  (assuming  the  node  is 
accessible  in  the  first  place)  is  not  executed  in  the  mutant,  the  two  are 
different . 

A  slightly  different  application  of  dead  code  detection  involves 
making  sure  that  mutated  code  is  not  inaccessible  or  dead  in  the  first 
place.  If  it  is  then  the  mutant  must  be  equivalent  to  the  original 
program.  This  application  is  Identical  to  the  application  in  compiler 
optimization  where  code  is  identified  as  dead  and  excluded  from  the  final 
output.  It  applies  to  all  mutant  operators.  An  example  of  this  sort  of 
analysis  In  testing  equivalence  is  shown  below: 

Original  Program  Mutant  Program 

A-l  A-2 

A-B+C  A-B+C 

Here  the  first  assignment  to  A  is  killed  by  the  second  assignment,  and 

thus  any  change  to  its  right-hand  side  is  insignificant.  A  more  drastic 

example  shows  inaccessible  code.  Again,  the  mutant  to  code  which  can 

never  be  executed  is  unimportant. 

Original  Program  Mutant  Program 

GOTO  1  GOTO  1 

A-2  A— 2 


162 


Some  cases  In  which  a  mutation  has  killed  a  block  of  code  can  be 
detected  by  using  invariant  propagation.  The  program  fragment  shown 
below  shows  how  this  can  happen: 

Original  Program 


Code 

Invariants 

IF(A.GT.B)  GOTO 

1 

FLAG 1«. TRUE. 

A<B 

IF(A.LT.B)  GOTO 

2 

A<H 

FLAG2-.TRUE. 

A=B 

2 

etc 

A<B 

Mutant  Program 

Code 

Invariants 

IF(A.GT.B)  GOTO 

1 

FLAG 1“. TRUE. 

A<B 

IF(A.LE.B)  GOTO 

2 

A<B 

FLAG2-.TRUE. 

2 

etc 

A<B 

Here  the  mutation  has  replaced  the  test  A<B  with  the  test  A<B.  However, 
the  invariant  pool  tells  us  that  A  is  always  less  than  or  equal  to  B,  and 
thus  the  branch  will  always  be  taken,  and  the  assignment  to  FLAG2  is 
dead.  Note  that  without  knowing  the  relationship  between  A  and  B  it  is 
impossible  to  determine  that  this  assignment  is  dead. 


5.0  AN  EQUIVALENCE  TESTING  POST-PROCESSOR  FOR  EXPER 

The  above  ideas  for  determining  equivalence  can  be  applied  in  a 
post-processor  to  EXPER  in  order  to  reduce  the  time  spent  by  the  user 
dealing  with  equivalent  mutants.  This  processor  should  be  run  after  the 
mutants  have  been  executed  on  the  test  data,  since  experience  shows  that 
as  many  as  90  per  cent  of  the  mutants  can  be  eliminated  on  the  first 
testing  run.  Of  the  remaining  mutants,  those  which  are  found  by  the 
post-processor  to  be  equivalent  are  flagged  as  such  and  the  user  need  not 
consider  them  further.  Only  those  which  are  not  found  to  be  equivalent 


163 


are  analyzed  by  the  user  to  Improve  his  test  data.  At  any  point  the  uaer 
can  manually  over-ride  the  post-processor  by  declaring  a  live  mutant  to 
be  equivalent  to  the  original  program  or  by  declaring  one  that  was 
thought  to  be  equivalent  to  be  live  again. 

The  analysis  proceeds  much  as  it  would  in  a  compiler,  with  a  few 
exceptions  which  arise  due  to  the  fact  that  we  do  not  necessarily  want  to 
produce  efficiently  optimized  code.  For  instance,  it  is  not  important 
that  we  worry  about  compiler-generated  constants,  since  they  can  never  be 
mutated . 

The  first  step  is  to  express  the  original  program  as  a  flow  graph, 
as  discussed  above.  This  step  may  be  done  as  part  of  EXPER's  parsing  or 
other  processing  of  the  program.  As  each  live  mutant  is  tested  for 
equivalence  to  the  original  program  a  flow  graph  is  generated  for  it.  In 
many  cases  this  flow  graph  will  be  isomorphic  to  the  original  so  that 
only  the  contents  of  one  node  need  to  be  modified.  In  more  complex 
cases,  where  the  shape  of  the  flow  graph  is  changed,  the  mutant's  flow 
graph  can  still  be  derived  from  the  original.  EXPER  represents  mutants 
as  a  descriptor  record  describing  the  change  made  to  the  original 
program.  These  records  fully  describe  the  mutant,  and  thus  allow  the 
mutant's  flow  graph  to  be  derived  without  re-generating  it  from  a  source 
program. 

Just  as  it  is  expected  that  mutant  flow  graphs  can  be  efficiently 
derived  from  the  original  flow  graph,  it  is  also  expected  that  the 
invariant  and  common  expression  pools  described  above  will  not  have  to  be 
computed  for  each  mutant.  Instead,  the  pools  for  the  original  can  be 
computed  at  parse  time  and  the  mutant's  pools  derived  from  them.  As 


164 


suggested  above,  many  mutations  cause  a  relation  to  change,  move  an 
expression  from  one  equivalence  class  to  another,  or  make  similarly 
limited  changes  in  the  pools.  These  changes  can  be  easily  detected  using 
the  descriptor  record  of  the  mutant,  and  can  be  made  as  local 
modifications  to  the  pools.  Obviously,  care  will  have  to  be  taken  that 
any  side  effects  of  these  local  changes  are  detected,  but  doing  so  should 
be  significantly  less  expensive  than  regenerating  the  entire  pool. 

The  invariant  and  common  expression  pools  described  above  can  be 
combined  into  a  single  pool  by  replacing  the  individual  variables  or 
constants  involved  in  invariant  relationships  with  the  equivalence  class 
sets  used  to  recognize  common  expressions.  Note  that  using  this  scheme 
the  relationships  "equal  to"  and  "not  equal  to"  do  not  need  to  be 
explicitly  represented,  since  if  two  objects  are  in  the  same  set  they 
must  be  equal,  whereas  if  they  are  not  in  the  same  set  they  must  be 
unequal.  If  the  entire  structure  of  sets  and  relationships  is 
represented  as  a  directed  graph  whose  nodes  correspond  to  sets  and  whose 
edges  to  relationships  (obviously  the  edges  must  be  labelled  as  to  what 
relationship)  then  the  problem  of  applying  transitivity  becomes  one  of 
simply  following  either  edges  labelled  '>'  and  *>*  or  edges  labelled  '<' 
and  until  either  the  desired  relationship  is  derived  or  no  edges  with 
the  appropriate  labels  remain.  Note  that  no  cycles  can  occur  which 
involve  such  paths.  Assume  such  a  cycle  did  exist,  for  instance  a  path 
using  only  edges  marked  '<'  or  '</  from  node  A  to  node  B  and  back  to  node 
A.  Since  a  path  from  A  to  B  exists,  transitivity  implies  that  for  any  X 
in  A  and  Y  in  B,  XCY.  However,  because  a  path  from  B  to  A  exists  we  also 
have  the  statement  Y<X.  Because  X  and  Y  are  in  different  sets  we  know 
that  X  Is  not  equal  to  Y,  and  thus  the  derived  relationships  are 


165 


contradictory. 

Representing  the  pools  In  this  manner  allows  a  great  deal  of 
flexibility  in  testing  equivalences.  The  following  example  shows  how 
this  can  happen: 


Original  Program 

Code  Invariant  &  Expression  Pool 

A-B+C 

D-E+F  {A, B+C) 

IF(B4C.LE.D)  GOTO  1  {A, B+C) , {D, E+F) 

X(A+C)-0  {A, B+C)>{D,E+F ) 

etc . 


Mutant  Program 

Code  Invariant  &  Expression  Pool 

A-B+C 

D-E+F  {A, B+C) 

IF(B+C.LE.D)  GOTO  1  {A, B+C ) , (D, E+F) 

X(D+G)-0  { A , B+C } < { D , E+F } 

etc . 

In  this  example  the  conditional  branch  allows  a  relationship  between  B+C 
and  D  to  be  deduced.  Because  the  relationship  is  then  applied  to  all 
elements  equal  to  either  B+C  or  D  we  can  conclude  that  replacing  A  with  D 
in  the  subscript  yields  a  mutant  subscript  which  is  always  greater  than 
the  original  subscript.  This  fact  suggests  that  the  mutant  is  not 
equivalent  to  the  original. 


Once  the  modified  invariant  pool  described  above  is  formed  it  is 
used  to  aid  the  detection  and  removal  of  dead  code.  Once  dead  code  has 


been  removed  the  mutant  and  original  are  compared  to  see  if  they  are 
obviously  equivalent.  If  so,  the  mutant  is  placed  in  the  equivalent 
mutants  pool  and  not  procesed  further. 


166 


Since  dead  code  is  irrelevant  to  the  state  of  the  program,  removing 
it  will  not  make  the  Invariant  pools  incorrect.  However,  it  may  be 
possible  that  removing  dead  code  enables  invariant  conditions  to  be 
strengthened.  The  following  example  shows  how  this  can  happen: 

Original  Program  Mutant  Program 

A-0  A-0 

IF(C.GT.D)  GOTO  2  IF(C.GT.D)  GOTO  2 

IF(C.LT.D)  GOTO  1  IF(C.LE.D)  GOTO  1 

A-A+l  A-A+l 

1  etc  1  etc. 

The  mutation  above  is  a  case  in  which  changing  a  conditional  (C.LT.D 

became  C.LE.D)  kills  a  block  of  code.  The  section  of  code  killed  is  the 

increment  of  A.  Because  of  this  increment  the  strongest  statement  that 

can  be  made  about  A  at  label  1  is  A>0.  Because  the  increment  of  A  is 

dead  in  the  mutant  this  invariant  can  be  tightened  to  A-0,  assuming  no 

other  branches  to  label  1  exist. 

Those  mutants  which  have  not  been  eliminated  by  manipulation  of  the 
flow  graphs  are  then  tested  for  equivalence  based  on  loop  invariants  or 
the  possibility  of  hoisting.  Any  equivalences  thus  found  are  placed  in 
the  equivalent  mutants  pool.  Again,  it  is  often  possible  to  apply  these 
tests  to  the  original  program  at  parse  time  and  deduce  their  results  on  a 
mutant  from  the  mutant's  descriptor  record.  Only  rarely  will  it  be 
necessary  to  actually  test  the  mutant. 

The  final  phase  of  the  post-processor  applies  the  invariant  pools 
generated  in  the  first  phase  to  actual  detection  of  equivalent  mutants. 

In  this  phase  many  mutants  may  be  automatically  eliminated,  especially 
those  Involving  unary  operators.  This  is  also  a  convenient  place  to 
provide  user  interaction  in  the  equivalence  determining  process.  The 


167 


processor  would  be  driven  by  a  set  of  rules  describing  sufficient 
conditions  for  equivalence  of  a  mutant  to  the  original.  For  instance, 
there  might  be  a  rule  concerning  absolute  values  which  can  be 
conceptualized  as  "Insertion  of  absolute  value  preserves  equivalence  if 
its  argument  is  greater  than  or  equal  to  0".  When  the  processor  is 
unable  to  decide  whether  a  rule  is  applicable  by  itself,  it  turns  to  the 
user  for  help.  This  help  is  requested  by  forming  a  question  from  the 
rule  and  posing  this  question  to  the  user.  For  example,  if  an  absolute 
value  operation  has  been  inserted  in  front  of  a  variable  which  does  not 
appear  in  the  Invariant  pool  for  that  statement  the  processor  could 
prompt  "Is  X  always  greater  than  or  equal  to  0?".  If  the  user  replies  in 
the  affirmative  the  mutant  is  flagged  as  equivalent. 

6.0  REMARKS 

It  has  been  shown  above  how  many  techniques  from  compiler 
optimization  can  be  applied  to  detect  equivalent  mutants  of  a  program. 
Several  areas  remain  to  be  explored  however. 

In  the  EXPER  system  only  first  order  mutations  are  considered 
(i.e.  mutants  coming  from  one  program  change),  but  conceivably  some 
higher  order  mutants  may  be  worthy  of  consideration.  In  many  cases  the 
heuristics  described  here  can  be  extended  very  easily  to  detect 
equivalent  mutants  of  higher  order.  It  is  also  true  that  in  many  cases 
equivalence  can  be  tested  transitively,  i.e.  if  program  P  is  equivalent 
to  P'  and  P'  is  equivalent  to  P"  then  P  is  equivalent  to  P".  However, 
it  is  often  true  that  a  high-order  mutant  can  be  equivalent  to  some 
program  without  having  intermediate  mutants  equivalent  to  either.  For 


168 


Instance  the  following  program  fragments  are  equivalent: 

IF(I.EQ.l)  GOTO  1 

and 

IF ( — I . EQ. 0)  GOTO  1 

However,  neither  is  necessarily  equivalent  to  either  of  the  intermediate 
mutants 

IF(I.EQ.O)  GOTO  1 
or 

IF ( — I.EO. 1)  GOTO  1 

Fortunately  the  problem  of  equivalence  of  high  order  mutants  is  not  a 
serious  problem  because  of  the  Coupling  Effect:  Test  data  that  screens 
out  all  first  order  mutants  will  screen  out  all  higher  order  mutants  (2). 
Thus  only  first  order  mutants  need  to  be  considered  in  evaluating  test 
data 


A  more  interesting  problem  involves  the  detection  of  equivalences 

which  are  very  dependent  on  the  form  In  which  the  programmer  has  chosen 

to  express  his  algorithm.  As  an  example  consider  the  fragment  below 

which  tests  whether  or  not  a  number  N  is  prime. 

IF(N.LE. 2)  GOTO  3 
L-N-l 

DO  1  1-2, L 

IF(N.EQ.(N/I)*I)  GOTO  2 

1  CONTINUE 

3  PRIME*. TRUE. 

RETURN 

2  PRIME*. FALSE. 

RETURN 

It  is  really  only  necessary  to  let  the  DO  loop  run  from  2  to 
INT(SQRT(N)).  The  test  N.LE.2  means  that  only  N  greater  than  or  equal  to 
3  will  be  used  as  upper  limits  for  the  loop.  Since  INT(SQRT(3))*1 , 
INT(SQRT(N))<N-2.  Thus  the  mutation  which  replaces  L  with  — L  in  chis 


169 


loop  is  equivalent  to  the  original.  Because  the  equivalence  of  this 
mutant  is  so  closely  related  to  the  conceptual  nature  of  the  program  it 
seems  very  difficult  to  automatically  prove  it.  This  problem  might  be 
solved  through  the  interactive  part  of  the  poet-processor.  Specifically, 
it  is  easy  to  find  out  where  the  mutant  occurred,  and  the  processor  could 
simply  ask  "Is  it  acceptable  for  this  loop  be  executed  from  2  to  L-l?". 

Several  techniques  for  detecting  equivalent  mutants  have  been 
described.  These  techniques  should  be  capable  of  finding  a  significant 
number  of  cases  in  which  a  mutant  is  equivalent  to  the  original  program, 
since  experience  indicates  that  most  equivalences  are  very  simple  ones. 
Often  they  involve  the  insertion  of  the  absolute  value  operator,  a  case 
that  is  particularly  easy  to  detect  using  invariant  propagation.  More 
complex  equivalences  can  be  tested  interactively  with  the  user.  The 
questions  thus  posed  should  help  the  user  decide  whether  or  not  to 
manually  declare  a  mutant  equivalent  to  the  original  program. 

Several  questions  concerning  equivalence  detection  remain  open.  At 
several  points  in  the  above  discussion  it  is  asserted  that  the  data 
needed  to  determine  equivalence  (e.g.  flow  graphs,  invariant  pools,  etc.) 
can  be  derived  efficiently  from  the  corresponding  data  for  the  original 
program  and  the  mutant's  descriptor  record.  While  these  assertions  are 
undoubtedly  true  in  many  cases,  exactly  how  often  remains  unknown. 

Further  experimentation  is  required  in  this  area,  particularly  with 
regard  for  the  following  questions: 

1.  In  what  fraction  of  the  cases  is  it  necessary  to  generate  a  flow 
graph  for  a  mutant  from  scratch? 

2.  In  what  fraction  of  the  cases  is  it  necessary  to  regenerate  the 
invariant  pools  for  a  mutant? 


It  Is  unlikely  that  a  change  to  an  invariant  pool  will  affect  only 
that  pool.  On  the  average,  how  many  pools  will  be  affected?  How 
does  the  cost  of  determining  all  affects  compare  to  the  cost  of 
re-computing  the  invariant  pools? 


REFERENCES 


Davis,  Martin  Computability  and  Unsjolvabil ity  (McGraw-Hill  Co.,  New 
York,  New  York:  1958).  '  '  - 

DeMillo,  Richard  A.;  Lipton,  Richard  J . j  and  Sayward,  Frederick 
G.  "Hints  on  Test  Data  Selection:  Help  for  the  Practicing 
Programmer"  reprinted  from  Computer  11,  4  (April  1978),  pp.  3A-A3. 

Kildall,  Gary  A.  "A  Unified  Approach  to  Global  Program  Optimization" 
in  Conference  Record  of  ACM  Symposium  on  Programming  Languages 
pp.  194-205,  1973.  -  - — 

Lipton,  Richard  J.  and  Sayward,  Frederick  G.  "The  Status  of  Research 
on  Program  Mutation",  reprinted  from  Digest  for  the  Workshop  on 
Software  Testing  and  Test  Documentation.  Dec.  1978,  pp.  355-373. 

Schaefer,  Marvin.  A  Mathematical  Theory  of  Global  Program 
Optimization  (Prentice  Hall,  Englewood  Cliffs,  N.  J. 1973) 


172 


mescyniQ:* 

The  C  0  n  U  L  Pilot  Mutation  System  ( f  f *’  £  )  is  b  e  i  n  a 
ievtloi  e  I  at  the  g  e  o  r  j  i  a  Institute  of  Technology  h  y  Allen 
Scree,  d  i  c  h  nevi l lo,  Jeanne  Hanks,  and  f  red  Say ward.  It  is 
hasel  in  "irt  on  the  Pilot  "utation  System  (I’ll!)  for 
r  :  T  t  r;  Jesioned  at  Yale  University,  and  irplewerted  at  Yale 
Dr  i  v“  r  s  i  t  y,  r.  i>  o  n  1  a  Institute  of  Tecdroloiy,  and  ftie  Univer¬ 
sity  of  California,  Perkelv. 

,,roorarr  mutation  is  a  sethorioloiiy  for  proorar'  testiri. 
tu  jnderlyinn  a  s  s  1 1  m  •  t  i  o  n  is  1 t>  a  t  [imraire  rs  produce 
-romans  that  are,  in  some  sense,  nearly  correct.  The  t)Oi  l 
of  the  -nutrition  system  is  to  aid  in  the  selection  of  good 
♦pst  data  hy  t a  kina  advantage  of  this  fact.  A  mutation  of  a 
program  is  a  pro  cram  n  '  that  differs  from  P  in  only  a 
s  inale  minor  ctan.ae,  such  as  substituting  one  variable  for 
inother  in  an  assignment  or  changing  a  +  to  a  -  in  an  a  r  i  t  h  - 
- e  t i c  excrpssinn.  Usually  the  number  of  simple  mutants  of  p 
i  r  n  *  5  auadratically  with  the  size  of  P.  Naturally,  some  of 
these  mutations  will  nroduce  mutant  programs  that  are  func~ 
tionally  eauivalent  to  the  original,  but  for  tie  others  we 
should  be  able  to  find  test  data  that  will  distinguish 
between  the  original  rrooaram  and  any  mutant. 

CDVS  is  designed  to  take  as  input  a  fixed  proaram  F- , 
and  to  automatically  produce  mutants  of  it  according  to  a 
set  of  mutant  ooerators.  The  system  will  then  accept  test 
cases  from  the  user,  run  the  original  proaram  and  all  its 
mutants  on  it,  and  tell  the  user  how  many  mutants  have  been 
"killed".  (A  mutant  i*  killed  when  it  fails  by  program 
fault  or  produces  a  different  output  than  the  original 
urogram.)  The  aim,  of  course,  is  to  kill  all  the  mutants, 
or  at  least  to  kill  erounh  so  that  the  user  is  reasonably 
certain  that  those  remaining  are  functionally  equivalent  to 
the  original  and  could  never  be  killed.  At  this  point  the 
user  has  a  set  of  test  data  that  is  sufficiently  powerful  to 
distinguish  between  th°  original  prooram  and  all  its  simple 
(noneauivatent)  mutants.  According  to  the  £9ynil22 
rypgthesis  this  test  data  will  also  he  sufficiently  powerful 
to  distinguish  between  the  original  proaram  and  any  other 
proaram  "close"  to  it.  (Multiple  mutations.)  This 
hypothesis  has  been  rrcved  for  certain  classes  of  programs 
pnf  for  certain  definitions  of  "close”,  and  theoretical  work 
continues  in  t'is  area.  -pctcnt  experiments  with  higher  or¬ 
der  "ijtants  of  F  ' 1  A' T  K  A  ‘J  routines  also  support  this 
-yeQtheslS. 

T-'us  the  user  can,  with  the  aid  of  C  P  v*  S  ,  produce  test 
data  that  will  distinguish  bretween  the  program  used  as  input 
a n q  any  progrjm  "close"  to  it.  Since  we  assume  that  the 
program  used  as  input  is  close  to  a  correct  proaram,  the 
best  data  will  he  sufficient  to  distinguish  between  the  in¬ 
put  prooram  and  the  correct  program,  if  they  are  not 
equivalent.  So  th»  test  data  will  he  sufficient  to 


I  /  J 


deuonst  rrtte  program  r  orrcctnpss,  to  a  high  degree  of 
certainty. 


13Pk£!!E5iI*IIQ6i  NQIE§ 


The  user  of  fPMS  provides  t  fi  e  name  of  the  file  contain 
inq  the  source  proqram.  This  program  should  he  in  the  suD* 
set  of  the  COHGL  language  specified  elsewhere.  CpMS  parses 
this  source  proqram  into  an  internal  form  suitable  for 
interpretive  execution.  This  internal  form  is  also  suitable 
for  "decompilation",  and  the  user  is  provided  with  a  decom¬ 
piled  version  of  his  proqram.  This  "source  listing"  may  not 
be  textually  identical  to  the  original  source,  but  it  should 
be  equivalent. 


The  system  then  produces  an  internal  file  of  all 
mutations  of  the  orininal  nronram.  These  are  stored,  not  as 
complete  nronrams,  nut  rather  as  short  descriptions  of  how  a 
mutant  is  to  oe  create1.  The  user  is  then  asked  to  provide 
a  file  or  files  of  test  data  for  his  orooram,  These  files 
may  he  created  outside  CPMS  using  the  editor,  or  they  may  oe 
created  "on  the  fly"  in  cpvis,  with  editing  capability  beinq 
restricted  to  backspace  and  line  delete.  However  the  user 
choses  to  provide  the  input  files,  CpMS  i n t e r p r e t i ve l y 
executes  the  source  oroqran  on  this  test  data,  saving  the 
output.  The  user  may  examine  the  outrut  and  decide  whether 
or  not  to  accent  it.  If  he  does,  then  thp  test  data  is  run 
against  all  enabled  mutants,  and  the  results  of  each  are 
compared  to  the  results  of  the  source.  a  mutant  rroduci ng  a 
different  result  is  marked  "killed”.  The  user  is  then 
presented  with  a  statistical  summary.  If  he  wishes,  he  may 
also  examine  more  detailed  information  about  the  mutants 
still  living.  Then  the  cycle  repeats  until  either  an  error 
is  uncovered  in  thp  orininal  rrooram,  or  the  user  is  satis¬ 
fied  that  all  remeininq  mutants  are  equivalent  to  the 
original.  A  CpvS  experiment  may  he  interrupted  and 
continued  later,  with  the  system  savirg  all  information 
neressary  for  the  resumption  of  the  run. 


In  response  to  the  experience  of  tryino  to  transfer 
rivS  fr  one  environment  to  another,  we  have  decided  to  try 
to  do  a  ‘  j  c  h  as  cos  sir.  le  to  isolate  machine  dependencies. 
At  the  risk  ol  possible  inefficiencies,  we  will  concentrate 
references  to  file  access  technioues,  character  storage, 
word  length,  and  such  machine-  and  operating  syste  Ai¬ 
de  pendent  features  in  a  few  small  routines.  For  example, 
P  I ’s  S  contained  7?  random  access  calls  in  the  PFC  FORT1?") 
dialect.  Fac1-  of  these  had  to  he  rewritten  as  a  p  R  I  '1 0  S  call 
during  the  transfer  procedure.  In  rpvS,  all  random  access 
will  be  through  the  routines  K  I  A  R  A  N  and  WRTRAV.  Those  two 
(small)  routines  are  all  rh^t  need  to  be  modified  to  inter¬ 
face  rpvs  with  a  different  operatinn  system.  Some  machine 
dependency  is  tolerated  in  fhr  interpretive  execution  phase 


174 


0<  Of'"?,  since  t  K  i  s  is  the  most  f  i  fe-r  onsuni  nq  rtiase  of  tne 

Tutdtion  ;  rocess.  uo#ever,  this  rlenenlency  is  kept  to  a 
filin' u m  ei/en  he^e.  The  buffers  usei  in  interpretively 
execut  iri.i  Droiirxrs  are  ir.  terser  arrays  of  one  nr  two 
dimensions.  The  swes  of  the  arrays  are  parameters.  w  e  as¬ 
sume  in  desinnina  ttese  arrays  that  a  single  integer 
consists  of  at  least  1  *  "its.  (i.e.  integers  are  restric- 
te  J,  wherever  rossible,  to  a  rjnqe  nt  +/-  7  ?  7  8  T  .  ) 


VflTfS  ON  T  of  C  Oh  of  PHOT  "UTATION  SYSTr" 


1  .  We  limit  ourselves  to  a  simple  subset  of  the  language. 

?.  We  limit  ojrselves  to  two  non r e w i nd a b l e  sequential  input 

files  and  two  non r e w i nd a b l e  seauentia  l  output  files. 
This  should  be  sufficient  for  such  common  applications 
as  m  a  k  i  n  <’  sorted  transactions  anainst  a  sorted  "aster 

fite  and  croduciro  a  transaction  report  and  an  updated 

raster  file. 


l'ath»r  than  providing  for  a  "rrei'i  cate  subroutine"  as  in 
l’IvS  we  will  simply  check  rutant  output  anainst  original 
-roo  ram  outuut  to  determine  whether  they  have  read  t  hi  e 
same  number  of  incut  records  an-*  produced  identical  out  * 
"■u  t  files.  I  h  is  believed  that  iust  checking  record 
counts  on  input  and  cut  rut  will  eliminate  many  mutants 
without  more  detail  comparisons. 


A.  Mutations  to  be  c  er  forme j : 


1  nrr  I  "AL  A  l.  tfcatiON  -  o  v  e  imriied  decimal  in 

numeric  items  one  place  to  the  left  or  r i n h t ,  if 
possible. 

?  RfVEKSF  T*  >Lf'VfL  T  A  R  L  F  P  I  T  N  S  I  O.NS 

5  OC.CUcS  C  L 6  U  S  R  ALTFRtTlOh  -  Add  nr  subtract  one 

from  an  0 C  C  U  » S  clause. 

A  INSERT  FlLLfR  -  of  length  one  between  two  items 

in  a  record. 

5  FI  LIE  C  Sl/F  A 1.  T  F  8  A  T  J  0  \’  -  Add  or  subtract  one 

from  length. 


FLEMENTARY  ITF*  R  E  V  E  R  S  A  L 


7  FILF  REFFRFNEE  ALTERATION 

8  statement  DELETION  -  Reolace  by  null  operatun. 

<5  GO  TO  -->  E’ERFORM 

10  PERFORM  on  TO 

11  T  h  f  M  -  ELSE  REVERSAL  -  Negate  condition. 

12  STOP  STATFN'ENT  SUPSTI  TUTION 

13  thru  clahsf  extension 

14  TRAP  STATEMENT  REPLACEMENT 

IE  SUPSTI TUTE  ARITHMETIC  VCRh 

16  SUBSTITUTE  OPERA  i  OR  IN  COMPUTE 

17  ParfnTHESIS  ALTERATION  -  Move  one  parenthesis 
one  place  to  the  left  or  right 

18  R  0  UN  h  F D  ALTERATION  -  Change  ROUNDED  to 

truncation,  and  vice  versa. 

19  -*o\)  E  REVERE  M  -  rpversr  direction  of  move  in 
si-role  ’'OVI  A  TO  H ,  if  the  result  would  he  legal 
in  C  3 1 0  L  • 

2,;  l.ir.  TCAL  OkrRATnR  REPLACEMENT 

21  SCALAR  r  r>  v  SCAIAP  REPi  acfwenT  -  Substitute  one 

(non- table)  iter  reference  for  another,  where 
tne  result  would  Re  lenal. 

??  constant  fur  constant  rfplacfmfnt 

P 3  CONSTANT  for  scalar  replacement 

P 4  SCALAR  FOR  CONSTANT  REPLACEMENT 

25  NUMERIC  constant  ADJUSTMENT 


176 


1QSSL  SUSSEX  A££EPl£fi  82  CESS 

2iyiSlov. 

c  L  2  £  5  A  2'  r  1 2  *  ?rooraii*natf  . 

[AUTM22*  cor^nt-entry.] 

UlSIilLLATTON.  comnenr-potry-J 
r  <’  />  T  r  ;  w  R I  T  T  f_  n  .  ccTucnfentry.] 
co'H'pnfpntry.] 

[Si_CUKMY.  cor-rpnt-cntry.l 
coT’ent*entry.  ] 

LL¥  I  SOL^i/il  QWISTQN. 

CC^Il&USSIlQts  LLCII2L- 

^!:2yi£Lr£2'i2lL6*  co",,er't-entry1  7 

tl'.ii£I"£2i'CylLE'  cn'M-en'-entry.  1 
rs;’t'C!'L-‘£'f_S.  r  f  f  •_  1  I_?  ^nc"onic*nainp] 

L i L* £ y I r 2 L1 1 L y I  HCIIQL- 
Li  L  L:CL',jIi2!=  • 

LLLLiT  fite-^Tie  T 0  (I':PUT1  |  I^.PUT? 

'■.JTm"t^).7.  1 


^  ft  I  A  MVI5I0N. 
r I L F  SLCI1C2. 

fn  filp-d,-.TP  h  L  £  2  L  L  CONTAINS  intPOPr  CHAWAlTFt-S 

r  L  ft  h  £  L  liLC'.’iiDi  {LI2L'0£LD  I  D.IIIL2>1 

data  Li  22! 2  lc-  -j -pvt  re . 

levet*nu  ntipr  (Jat.f-n.)tpp  |  LILLFPT 

2LL2E.LI2LS  iatri-na'Tic-f’l 

r^LIClL’lL  I  L I  £  ^  T  S  charactpr-strinn] 

eofcy^S- i nPeqer  r I v  r  S 1 
l  V  A  flip  T  s  I  i  t  p  r  «  I  ] 


OUTPUT  1  | 


I// 


c  £Q!2  £  I  l'£*r  s  IQ  r  a  qf  §££ tion. 

F  7  7  level  entries.! 
frecorl  entries  .1...] 


PROCFOllRr  DIVISION. 

(paragraph-name  .  1 

A  HQ  ( ident -1  |  lir-1>  [ident-?  |  lit-?]...  (TO  | 

51^1^5 )  i  dent  -tr 

CROUNFFD!  TON  S  X  7  F ;;  RQ£  iterative-statement]  . 

C  L  0  S  ^  filename-1  [filename-?]  ... 

rO^ruTr  identifier  TRnyK'OFQ]  =  arithmetic-expression 
COM  5 17 r  FRROR  imperative] 

C I Y I D L  ( i dent-1  |  lit-1)  (  T  M  T  P  |  f?v)  (ident-?  |  lit-?} 

^ Q 1 5 1 ^ 5  idenf-T]  CROy'jOtQ!  ro\~M££  r&POR  imperative]  . 

r  x  1 1 . 

QQ  TO  nr ocedure-name-1  f  (orocedu  re-name-?]  ... 

ON  identifier]  . 

IF  condition  (statement-1  |  t»  l_  X  T  §TATF“FNT> 

L  £  L S F  (statement-?  |  NFXj  S T A T £ MT } ” ] ~ 7 

movy  idert-1  TO  i dent-?  [ i d  e  n f -  5 ]  _ 

Tk1  LIICLI  (ilent-1  |  l  i  t  -  1  >  Q  Y  (ident-?  |  lit-?) 

CjTyijO  i  1  e  n  t -  5 1  (£0 UNHID]  [  5  N  5I7f  IRFQF  imperative]  . 

F  II:  £  L  I  filenime-1  f filename-?]  ) 
rnylPiji  filename-?  [filename-*.]  ] 

L  t  £  L  2  K  2  irocedure-name-l  [  T  R  R  U  r'rocf  rlure'n»Te*Z] 

[  {  i  dent-1  |  inteqer-1)  T  |  y  S  7  . 

R  f  A  Q  filename  K  F  C  1 R  0  f  I  M  1 0  identifier] 

AT  f N P  imperative 

MOQ  u  y  *j  . 

5  L' J I S  A  C  T  (i  dent-1  |  lit-1}  (ident-?  |  lit-?]  ...  FRO^ 

(  l  d  e  n  t  -  n>  |  l  i  t-m) 

C  (3 1 V 1 1!  Q  ident-n]  T  £ OUNny p ]  TOM  S  I  Z  r  fPROf  imperative]  . 

W  R 1 1  r  record-name  [FRO*  identifier-1] 

Af>VANCIwG  (ident-?  |  intencr  (  mnemonic)  LINES]  . 


c  r>wS  ci<N 


The  four  nhas(“5  o  t  fhe  C  PM  S  run  are  the  ENTRY  phase, 
thp  F  m  e  -  P  U  M  Phase,  the  MUTATION  Phase,  and  th  POST-RUN 
phase.  The  fMTRY  phase  is  executed  only  .hen  the  user  first 
enters  the  system.  Thereafter  the  PR r-RUN,  mutation,  and 
PfST-PU1"  "hasss  are  exec  ted  cyclically. 


I.  The  entry  [hast. 

I  he  session  will  *■  e  g  i  n  when  the  user  enters  the  system  by 
lonjin.n  in  and  t  y  n  i  no  s  e  g  £  L’  I  (In  either  upper  or  lower 

case.)  If  all  is  well,  the  system  will  respond: 

WELCOME  TO  THE  COBOL  PILOT  MUTATION  SYSTEM 
hollowed  hy; 

PLEASE  ENTER  THE  NAME  OF  THE  COBOL  PROGRAM  FILE: 

Th°  user  should  do  just  that.  CPVS  creates  several  working 
files  of  its  own,  whose  runes  are  variations  of  the  source 
file  name  formed  hy  adoina  suffixes  to  it.  The  system 
checks  to  see  if  those  working  files  already  exist.  If  they 
do,  th-»  usee  cjn  either  continue  the  jrevious  run  on  that 
source  file  wh»re  he  left  off,  or  he  can  start  over  from 
scratch.  Therefore,  if  the  worVina  files  already  exist,  the 
system  asks: 

DO  YOU  WANT  TO  PURGE  WORKING  FILES  FOR  A  FRESH  RUN  ? 

If  a  new  run  is  needed  the  system  begins  with  the  message 
PARSING  PROGRAM 

A  syntax  error  in  the  source  program  automatically  aborts 
the  rPvS  run.  Tie  user  must  correct  the  error  and  re-pntef 

t k  e  system.  Irrors  are  reported  to  the  user  as  a  source 
program  line  number  an'  the  probable  cause. 

The  system  the  issues  the  messages 

SAVING  INTFRNAL  FORM 

CREATING  MUTANT  DESCRIPTOR  RECORDS 

II.  T  n  e  tre-rjn  mas-  . 

In  this  Phase  the  user  supplies  test  data  and  turns  on 
mutants.  The  system  asks 

DO  YOU  WANT  TO  SUBMIT  A  TEST  CASE  ? 

and  the  user  should  resoond  YrS  or  NC.  The  system  will  ask 
WHERE  IS  INPUT1  ? 

(if  there  is  a  S  F  l  F  C  T  statement  for  T'PUTI) 
to  wk'ich  the  user  should  respond  HTRr  or  <filename> 

’*  i  f  is  ><r  r  r ,  the  us*»r  enters  the  inrut  data  directly,  end¬ 
in':  wi'h  the  ennt  rol-f  for  end  of  tile. 

Trie  system  then  goes  tn rough  the  same  procedure  for  INPUT?, 
if  it  has  been  named  in  a  SCLktT  statement. 

Ah  tHis  point  the  system  « i  l  l  execute  the  rrocram 
interoretivelv  on  the  test  input.  After  finishinn,  the  out¬ 
put  on  files  OUTPUT!  and  OUTPUT?  will  be  displayed  .  The 
u 5 1 r  is  asked: 


179 


IS  THIS  TEST  CASE  ACCEPTABLE  ? 

To  which  the  user  should  resoood  YES  or  NO. 

If  YES,  the  test  case  (  inrut  and  output,  a  lorn  with  the  time 
used,  and  counts  n't  records  read)  are  catalogued  for  later 
use  with  mutant  proqrnms.  it  NO,  the  test  case  is  purged 
Iron  memory . 

This  process  of  entering  test  cases  iterates  until  all  input 
coses  for  this  pass  have  been  entered. 

At  this  tine  the  system  will  ash 

WHAT  NEW  MUTANT  TYPES  ARE  TO  BE  CONSIDERED  ? 

Th®  user  should  respond  ALL  or  NONr  or  SELECT  or  should  give 
the  numbers  of  the  mutant  tyres  to  be  used  next.  SELECT 
causes  the  system  to  list  each  type  that  has  not  yes  been 
considered,  and  then  ash  for  types. 

The  list  of  numbers  should  he  terminated  with  the  command 
STOP. 

III.  The  wutation  Phase 


At  this  time  the  test  cases  will  he  run  a  gainst  the  mutant 
croq ram.  The  time  that  this  tales  depends  on  the  number  cf 
test  cases  oresented,  the  lenath  and  "oensity"  of  the 
program,  and  the  types  of  mutants  currently  brine 
considered. 

After  all  the  test  cases  have  been  executed  for  each  mutant 
Still  alive,  the  system  will  display  the  statistics  of  the 
run,  indication  t-he  number  of  mutants  created  and  the  number 
still  alive  of  each  ty;e  that  h&s  been  considered,  as  well 
as  summary  counts  of  n  •<  w  the  dead  mutants  died. 

TV.  The  rost-Pun  Phase 

Now  the  user  has  a  chjnce  to  view  the  mutants  still  remain¬ 
ing  (either  all  of  then,  or  selected  types,  or  one  randomly 
selected  mutant  of  each  tvi  e),  or  t  e  can  send  information 
about  the  run  to  an  output  file  for  later  orintina.  To  end 
the  port-run  phase  the  user  tyres  either  halt,  ending  the 
session,  'ut  saving  the  temporary  files  for  future  resump¬ 
tion,  or  l_QOh  sending  the  system  hack  in  a  loon  to  the  cre- 
run  pi,  i  s  e  to  enter  uor"  f*st  *a<at  >  and/or  consider  new  mutant 
t  v  r  e  s  . 

The  user  t,iv  trrminjte  the  session  at  any  time  the  .a  command 
is  requested  hy  tyrin:  KTLt. 

The  user  can  receive  an  explanation  of  h i s  options  at  many 
points  in  the  ryclf  by  typing  Hfl  I  . 


OTHER  UOR<  IN  c kOGKESS 


Kutaf  ion  analysis  legends  on  our  .ability  to  restrict  our 
.attention  to  single  mutations,  to  avoid  a  combinatorial  ex¬ 
plosion  in  t  h e  numner  of  mutations  performed.  This  is 
justified  ny  the  cnij^linn  t)yi>oth£§is  that  says  that  any  test 
data  that  is  strono  enouoh  to  distinguish  between  a  program 
and  all  its  none  311  i  va  l  en  t  sinole  mutants  is  also  strong 
onjjnh  to  distinguish  the  original  program  from  more  complex 
mutants.  A  version  of  C  P  v  ^  has  been  prepared  to  test  this 
hypothesis  by  reandomlv  sampling  higher  order  mutants, 
executino  them  on  test  data  that  has  been  found  sufficient 
tor  first  ordnr  mutants,  and  reporting  any  hioher  order 
mutants  that  are  not  eliminated,  along  with  statistics  on 
how  all  of  thp  other  mutants  were  eliminated.  It  is  hoped 
t  h  a  t  this  will  provide  us  with  an  estimate  of  the  likelihood 
that  a  more  compilcatei  error  would  escape  detection  in  the 
mutation  process. 


PILOT  MUTATION  SYSTEM  (PIMS) 
USERS'  MANUAL 


by 

Donald  M.  St.  Andre 

School  of  Information  and  Computer  Science 
Georgia  Institute  of  Technology 
Atlanta,  Georgia  30332 


182 


TABLE  Of  CONTENTS 


CHAPTER 

I .  Introduction .  1 

II.  Theoretical  Overview .  S 

III.  PP  o  User's  Guide .  7 

IV.  Implementation  And 

Portability  0  i  s  c  ur  s  i  or. . ?1 

V.  Summary . 2t 

APPENDICES 

A.  [rror  'pssd'ies . 

R  .  FORTRAN  Lannuaoe  Subset..... . TS 

C.  Cor.  rands  And  Abbreviations . 4. 1 

D  .  Description  0  t  The  "utations  Ferforrer1 . U ' 

E  .  Entering  And  Podifyina  files . S2 

F.  Sample  PIT’S  Run . * ' 

Hlf'LIOGPAPHY . f 


183 


CHAPTER  I 

INTRODUCTION 

A  familiarity  with  the  PR]  “'OS  operating,  system  and 
the  file  management  system,  as  applied  to  the  PDIvE-40(' 
computer,  and  a  familiarity  with  the  Software  Tools 
Subsystem  CT73  is  assured.  Detailed  discussion  M  the 
respective  command  syntax  will  he  avoided  excp'-t  where 
required  for  clarity  or  completeness. 

This  parer  is  a  discussion  of  t  hi  e  operational 
processes  and  the  implementation  of  a  pilot  system  for 
performing  proqrarr  mutations.  Since  this  is  an  operational 
discussion  I  will  not  attemrt  a  detailed  theoretical  studv. 
Such  studies  are  available  in  [14,1SD. 

I  will  also  describe  the  operation  of  a  pilot 
mutation  system  for  performing  mutations  on  TO  STRAP, 
subroutines  as  a  means  for  test  inr  program  correctness.  The 
syster  will  accept  an  input  file  which  is  assume r.  to  contain 
a  fORTRAN  subroutine  valid  in  the  lancuace  subset  (see 
Appendix  P).  Mutations  are  qrnerated  according  to  orerator 
commands  and  each  mutant  is  checked  for  correctness. 

The  system  is  divided  into  three  operational  phases: 
a  pre-run  phase,  a  mutation  ohase,  and  a  rest -run  phase.  In 
the  present  implementation,  the  entire  system  is  resident  ir 
approximately  h7h  words  of  virtual  memory.  The  phases  are 
independent  enough  procedurally  that  they  may  le  overlayed 


184 


for  use  on  a  smaller  memory  configuration. 

In  the  nre-run  rhase  files  are  orenen  or  c  r  p  n  t  e  d  an-i 
instructions  are  accented  for  the  :rocPcsinr  i  r.  r.  tier 

phases.  If  an  "internal-fom"  file  oces  not  exist,  ?  run  is 
called  an  initial  run.  Throughout  all  r^ases  of  system 
operation,  activities  are  different  for  initial  a  no 

subsequent  runs. 

In  the  mutation  phase,  mutants  are  created  and 

tested.  It  is  the  mutation  ^hsse  that  is  the  central  nart 
of  the  system.  During  th»  post-run  phase,  statistics  are 
displayed  about  the  mutations  tested  thus  far  i"  the 

processing.  In  addition,  files  are  closed  for  use  in 

subsequent  runs.  One  featur®  of  the  system  design  is  the 

ability  to  execute  the  three  phases  in  sequence  and  then 

loop  back  to  the  p  re  -run  phase  for  acd  i  t  i  or.  a  l  r  r  r>r  e  s  s  i  no  . 
In  this  manner,  a  user  ran  derate  the  system  anp  nair 
insight  which  is  then  used  in  tailoring  the  responses  i r  the 
next  pass.  This  repetitive  refinement  of  the  test  o ate 

contributes  greatly  to  the  rariri  converoence  on  a  set  nf 
acceptable  test  dat?[14]. 


185 


CHAPTER  II 

THEORETICAL  overview 

An  increasing  r  roductivity  burden  or  soft  care 
developers  has  contributes  to  the  increased  use  of  several 
aids  tor  the  desion,  i  m  p  l  e  m  entation  and  debuggino  ot 
large-scale  software  products.  However,  these  aids  are 
intended  for  the  actual  p  r  no  r  ■,  e  r  s  and  first-level 
management.  They  rrovide  qualitative  descrirtions  rather 
than  Quantitative  i  nf  ormpt  i  on  that  tray  be  used  throuohout  a 
management  hierarchy.  The  typical  manager  will  ash 
questions  like,  "no»  close  is  the  project  tr  sonethinn  that 
the  users  will  find  acceptable  as  a  first  release0"  and, 
"How  well  has  the  pre-gram  been  tested0"  The  techrioues  of 
modularizationtU],  struefureo  pronranrincMO,  and  t  r  o  n  r  a  m 
ve r  i  f i c a t i on C 9, 1 C 1  do  not  seem  to  answer  these  end  similar 
ouestions.  This  lack  of  answers  arrears  understandable 
because  manage  rent  should  not  be  expected  f  u  understand 
programming  lannuanes  and/or  sophisticatec;  mathematics.  Jr 
this  chapter,  we  will  explain  how  a  new  testinq  approach 
known  as  program'  mutation[1ST  can  be  used  tc  manage  software 
effective  ly. 

A  statement  of  the  program  test inc  problem,  as  seen 
by  management,  might  be:  Given  a  nrooram  module  and  its 
associated  test  data,  how  well  dees  the  data  test  the 
module--in  quantitative  terms?  To  solve  this  problem. 


186 


proarar  mutation  provides  a  Quantitative  mp^sure  t 1  the 
"goodness”  of  the  test  rata.  We  make  the  assumption  that 
the  better  the  test  data  (i.e.,  the  more  complete)  the  rort 
thorough  e  program  has  teen  rested.  And  in  a  somewhat 
simplistic  fashion,  the  mi  ore  thorouoh  the  testing,  the  more 
confidence  can  he  placed  in  a  nroarar's  correctness. 

A  pilot  sys  tec  which  performs  r  n  o  g  r  a  m  rut  at  i  cn  T1  5  ] 
produces  a  "score"  which  indicates  the  adequacy  of  the  test 
data.  The  users  attempt  to  imnrove  upon  this  "score"  by 
either  augmenting  the  test  data  or  by  an. sterino  " Questions" 
about  the  nrooram  heinc  tested.  These  Questions  address  the 


essence 

of  the 

p  roo  r  a m  by 

forcing 

the 

u  s  p  r 

t  0 

c  err-'  a  r  e 

a  1  t  e  r  n  a  t 

e  f errs 

of  a  oiven 

statement 

and 

to  make 

a 

decision 

w  t  ether 

the  many 

forns  are 

enuiva  lent . 

This 

process  of 

supplyino  test  data,  answerino  Questions  and  inter; rf tinn 
results  continues  interactively  until  the  user  is  satisfied 
with  the  Quality  of  the  test  data.  r,eanwhile,  all  of  the 
data  re  rains  available  at  each  iteration,  for  man  anfe 't 
review  so  that  a  quantitative  answer  to  "ho»  well  a  procram 
has  been  tested"  can  be  obtained. 

£yta£ion  ?ethodoJ.gg^ 

Program  testing  cannot  be  deductive.  We  know  this 
since  rroaram  testino  attempts  to  derive  finite  trs*  -  .<  *  ■. 
which  implies  oeneral  correctness.  Test  n  1 1  a  p 4  ’  •  • 


AD-A107  774  GEORGIA  INST  OF  TECH  ATLANTA  SCHOOL  OF  INFORMATION  A— ETC  F/G  9/2 
PAPERS  ON  PROGRAM  TESTING* CU> 

1979  R  A  DEMILLO*  R  J  LIPTON*  F  G  SAYNARD  N00014-79-C-0231  • 
UNCLASSIFIED  GIT- ICS-79/ 04 _ 


is  known  as  arieouate  test  data."  fine1.,  since  adequate  test 
data  cannot  in  general  be  derived  a  (  aor  i  t  be.  i  c  a  l  l  y  1 4  3  , 
oroarar  testing  is  not  deductive.  tor  this  inductive 
process,  we  are  therefore  tryino  to  answer  a  fundamental 
Question,  "If  a  prooram  is  correct  on  some  finite  nutter  of 
test  cases,  is  it  correct  in  general?"  Several  rrethoc  have 
emerged  which  allow  one  to  gain  confidence  in  test  data 
adequacy.  These  methods  include  path  ana  1 y s i s C 1 , ? , S, h ]  and 
an  associated  technique,  symbolic  execut ion[7,c].  The  basic 
idea  of  rath  analysis  is  to  exercise  all  control  raths 
within  a  program.  Symbolic  execution  attempts  to  derive  the 
test  data  necessary  to  do  this.  Test  data  known  to  exercise 
each  flowchart  path  at  least  once  is  letter  than  test  data 
that  does  not.  It  should  be  apparent  that  the  possibility 
of  faulty  analysis  is  very  realC3D. 

Let  us  approach  the  problem  of  testing  from  a 
different  viewpoint  and  assume  that  experienced  program  m.  ers 
write  crograms  which  are  either  correct  or  are  "almost" 
correct.  Stated  more  formally: 

If  a  program  is  not  correct,  then  it  is 
a  "m>utant"--that  is,  it  ditfers  from  a 
correct  program  by  simple,  well 
understood  errors  Cl  13  . 

Frrors  have  been  found  to  be  caused  by  one  of  three  broad 
cateqorie$C12D.  First,  the  specifications  may  be 
misunderstood.  Second,  the  specifications  may  be 
implemented  incorrectly  these  are  the  so-called  "logical 


188 


errors."  Third,  the  errors  may  he  of  a  purely  clerical 
nature.  The  program  mutation  methodolooy  car  lead  tr  the 
detection  of  all  three  error  types[16D. 


189 


CHAPTER  III 

PIMS  USER'S  GUIDE 

This  document,  in  conjunction  with  the  appendices, 
describes  how  to  use  a  terminal  to  orerate  PIMS  on  the 
PR  I*E-4C0  computer.  All  ccntmuni  cat  ior.s  tc  PIMS  must  be  in 
capital  letters.  Loner  case  letters  are  treated  as  errors 
bv  PIMS. 

PIMS  consists  of  three  sequentially  executed  rhases 
which  are  called  the  "Pre-Run  Phase,"  the  "Mutation  Phase," 
and  the  "Post-Run  Phase."  Throughout  these  rhases  errors 
may  occur;  and  the  types  of  errors  detected  by  PIMS,  the 
error  messages,  anc-  PIMS*  reaction  to  errors  are  described 
in  "Pirns  Error  Messaaes"  (sep  Appendix  ().  In  this  chapter 
it  is  assumed  that  no  errors  take  place. 

In  the  Pre-Run  Phase  the  user  tells  PIMS  what  r  roorat 
is  to  undergo  mutation  analysis,  aescribes  those  aspects  of 
the  program  and  the  test  data  needed  by  PIMS  to  execute 
mutations,  describes  the  types  of  mutations  he  warts  done, 
and  partially  describes  the  contents  of  his  output  file. 
The  user  may  also  reouest  that  certain  status  information  be 
displayed  on  the  terminal.  During  the  Pre-Run  Phase  the 
user  may  terminate  his  run,  leavino  his  transient  files 
unchanged,  by  issuina  a  kill  response  as  a  reply  to  any  PIMS 
prompt  for  i rput . 

In  the  Mutation  Ehase  PIMS  creates  and  executes 


190 


mutants.  There  is  no  us^r  i  rteract  ion  durino  this  i  base. 

In  the  Post'Run  Phase  the  user  cn";  Utrs  his 
description  of  his  output  •file*.  xe  fray  also  recue st  that 
certain  status  information  he  disrlayed  on  the  terrirel. 

Rynning  PJIJS 

In  the  explanations  and  examples  tat  follow,  " Sface" 
characters  are  significant  and  should  he  used  exactly  as 
shown.  In  addition,  any  response  to  a  terminal  Question 
should  be  terminated  by  a  "carriage-return"  or  "newline" 
character  as  approrriate  for  the  user's  srecific  terminal. 
To  begin  the  execution  of  PIVS  the  user  tyres  the  following 
c  ommand : 

OK,  SEG  R  U*'  >  P I  S  (see  note  below) 

Pit'S  responds,  as  soon  as  it  has  been  loaded,  by  disrlavino 
the  following  message 

PR  E -  RUN  PHASE 

ALL  INPUT  tfUST  PE  IN  UPPE?  CASE 


NOTE: 

Non-casual  users  of  PTftS  should  consult  with  the  Pit'S 
staff  for  details  about  loain  and  file  integrity. 


Erg  ;Run  Ph  a  |e 


The  Pre-Run  Phase  consists  of  six  seouentially 
executed  parts,  s  one  optional  depending  cn  whether  cr  not 
PIMS  is  being  run  for  the  first  tire  on  the  given  r  roora*. 
PI*S  reauests  the  rare  of  the  F aw  Froqrar  file  t  y  displaying 
the  following  nessage: 

ENTER  The  Raw  PPOPh>AM  FILE  NAPE  . 

Raw  ProqraT  Files  are  created  usiro  a  text  editor  rrior  to 
entering  PIP. S.  A  short  tutorial  on  using  this  editor  is 
available  as  a  login  option  (see  also  Arpendix  T).  The  user 
types,  on  the  next  line,  exactly  six  characters  which  tell 
PIMS  the  file  in  which  the  raw  program  resides.  This  also 
sets  the  file  nomenclature  convention. 


192 


£I53§  EliJrNamg  CfiDVCDlifiD 

The  raw  pronrair  fite  name  is  exactly  six  characters. 
We  represent  this  six  character  name  ly  the  symbol  <name>. 
Then  the  PIPS  system  files  are  created  and  tray  be  accessed 
with  the  following  suffixes. 


<narre>.I . Internal  Form  File 

<  n  a  rro  >  .  T . Test  Data  File 

<name>.C . Correctness  descriptor  File 

<name>.P . mutant  Information  File 

<name>.C . Report  Output  File 

<  n  a  rr  e  >  .  D . Mew  Test  Data  file 

<  n  a  m  e  >  .  N . New  Mutant  Infornatior  File 

<n ame > . r . . . . . P r edi c a te -sudr out i n e  internal  file 


NOTE:  The  user  is  referred  to  Appendix  F  for  the 
details  of  creating,  editing,  and  it, a  i  nt  a  i  ni  no  the 
raw  prngrart  files  which  will  he  processed  by  F'lMS. 

F I  v  f,  determines  the  run  type,  either  an  initial  run 
on  the  program  or  a  suhsecuent  run,  by  searchir.n  for  a  file 
with  the  six  character  name  entered  ano  a  suffix  of  ".I"  If 
this  file  is  found,  the  run  is  considered  to  he  a  subseouent 
run;  if  it  is  not  found,  the  run  is  considered  initial. 
Once  the  system  determines  that  a  run  is  suhsecuent,  the 
user  is  given  the  opportunity  fo  discard  all  previous  files 
and  start  over. 


During  an  initial  run,  PI^S  accerts  instructions 
about  the  routine  being  tested  and  ary  associated  test 
cases.  These  instructions  consisr  of  toe  sub-parts  as 


described  below. 


93 


<1>  CiissiiisiSiSQ  si  tbs  Eersai  PiLaceisrs 

PINS  requests  that  the  user  categorize  each  format  parameter 
(for  illustrative  purposes  let  the  variable  he  named  x )  by 
successively  displaying,  until  all  raraneters  are 
categorized,  the  following  message: 

CATEGORIZE  FORMAL  FARAMFTFR  y 

The  user  then  types  the  keyword  c o r r e s pono i no  to  one  of  the 
categories  INPUT,  OUTPUT,  or  INPUT /OUTPUT ,  or  tyres  HFLP  if 
he  has  forgotten  details  ano  w  a  r  t  s  P  I  v  S  to  display  the 
command  keywords. 

(2)  Mutant  £orrec£ng|j  Og J i on 


To  determine  whether  mutant  correctness  is  oetermined  by  the 
"predicate  subroutine"  method  or  the  "same  as  the  program" 
method,  FIVS  displays  the  following  message: 

IS  MUTANT  COWKFCTNTSS  t>F  P  F  *'  DF  NT  ON  A  F'FF.CICATF 
SUP ROUTINE? 

TYTE  yes  or  no 

The  user  types  in  the  appropriate  reply.  If  YES  is  entered, 
P I  MS  displays  the  predicate  subroutine  statement  it  has 
found  in  the  Predicate  Subroutine  File.  The  user  creates 


this  file  prior  to  any  initial 


runs 


with  the  appropriate 


194 


file  name  as  described  under  file  nane  convert irns. 

PREPICATE  SUBROUTINE  STATEMENT 

{the  predicate  subroutine  n  a  r>  e  a  n  n  torn,  al 
parameters) 

< 3 )  Ccsatign  of  Jhe  Test  Baja  file 

At  this  point  P J"S  ir  readv  r  o  receive  the  ter t  dota  f ro” 
the  user  and  sionifies  t-  is  by  disrlayiro  t»^e  f  o  l  l  n .  i  nr 
message : 

HOW  N ANY  TEST  CASFS  ARE  T C  Pf  SPFC1FIFP? 

The  user  enters  an  appropriate  count.  For  each  test  case, 
PINS  prompts  the  user  to  pr.trr  values  for  tt  e  input  forral 
parameters  of  the  program.  rirst  F I "  S  renuests  the  values 
of  the  scalar  parameters,  then  the  one  dimensional  array 
formal  rarameters,  anc  finally  the  two  dirpnsional  arrays, 
all  in  a  manner  to  he  described  below.  *->n  uests  for  a 
specific  test  case  are  sirnalled  by  PIPS  displaying  the 
following  me  ss  aae : 

SPF  Cl  EY  TEST  C  ASF  i  : 

The  values  of  the  scalars  are  reruested  hv  P I  3 ,  five  at  a 
time,  until  all  scalars  are  satisfied,  ty  iterating  the 
message 

fNTEE  VALUES  FOR  VI  V?  V  T  V  4  VS 


195 


The  user  then  inruts  the  runners.  Should  the  user  have  a 
taroe  volume  of  test  data,  he  ray  enter  the  keyword  r  I L  E  at 
this  point.  The  syster  will  ask  for  a  file  name  and  reao 
the  test  case  data  fron.  that  file.  Sinole  dimensioned 
arrays  are  irput  one  at  a  ti-e  by  P  I w  S  rrouesting  values  for 
scecific  array  elements  until  all  values  have  been  entered, 
for  examcle,  let  the  array  be  named  f  arc  its  dimension  be 
7.  (NCTF:  P  I  *•'  S  pets  this  dimension  from  the  rrocram's 
r  I  M  \  S  I  0  h  s’ateir.ent'-it  must  be  either  a  constant  or  an 
input  form,  a  l  parameter  scalar  variable.)  The  sessinr  would 
be  as  follows:  PIP?  displays 

EMTFK  VALUES  FOR  A(1)  A(?)  a  ( T  )  A(4)  a ( 5 ) 

The  user  would  enter  five  numbers.  PIVS  then  displays 

t  N  TE  F  VALUES  FOR  A  (f  )  A  ( 7) 

The  user  would  enter  the  final  two  rumtprs.  rib?  then 
rereats  this  rrocess  for  another  one  dimensional  array  or 
ooes  or  to  reauest  values  for  the  two  dimensional  arrays. 

In  the  case  where  a  user  warts  to  input  array 
partially  defined,  he  enters  U r. r  rather  than  a  number  for 
the  undefined  array  elements.  Only  numeric  data  of  the 
tyre,  INTEGER,  may  be  processed.  F I v  S  reouests  the  values 
for  two  dimensioned  array  elements  in  a  manner  similar  tc 
that  for  single  dimensioned  arrays.  The  values  are 
reouested  in  ro»-major  order,  five  at  a  time.  For  example. 


* 


196 


if  A  is  of  dimension  (2,7)  P  I  will  rr  .1 F  e  tie  following  four 
p  r  o  m  r  t  s 

F  N  T  E  F  VALUES  FC'P'  £  (  1  ,  1  )  A  (  1  ,2)  A  (  1  ,  7  )  t  (  ",  ,  U  > 
A  (  1 , 5  ) 

rf.  TFR  VALUES  FOP  A  M  ,  h  )  A (1,7) 

INTER  VALUES  FOP  A(?,1)  A(?,?)  a(?,7)  A <?,6) 

e( 2, 5) 

ENTER  VALUES  FOR  A  '2,7)  A  (2,7) 

(4)  Additional  lest  Cases 

When  additional  test  cases  can  be  adoeo  to  the  test  case 
file  of  the  qiven  rroarar,  FTP’S  displays  the  followino 
message  : 

HOW  CANY  NEW  TFS7  f  A  S  F  S  FOP  THIS  FUN? 


The  user 

then 

enters  the 

arprorriate  count. 

r- 1  *■’  s 

t  h  e  r 

prompts  the 

user 

to  specify  the  new  test  cases  in 

the 

same 

manner  as 

described  above  in 

the  "Creation  of  the 

Test 

0  a  t  a 

rile"  sursection. 

The  result 

is  t  n  extend  the 

Tent 

Fata 

File.  Test 

cases 

cannot  be  oeleted. 

( 5 )  Addition  of  and  Stajy§  of  Wyt ant  Tyge  s  Conjide red 


To  see  if  new  types  of  mutants  are  to  he  considered  for  this 
run,  FI***  displays  the  fol  lominn  messaqe: 


197 


WHAT  •“EW  TYPES  Of  "UTANTS  /.  ft  f  TO  PF  COU  SI  DC  RfO: 

At  this  point  the  user  has  several  options.  He  nay  tyre  in 
any  of  the  following  replies 


NONE  -  Part  (S)  terminates. 

HELP - P I  *'  S  displays  all  the  code  names  c,f  the 


mutant  types  as  described  in  «r.  jendix  C.  Tart  (c) 
is  then  re-executed  by  PJFS. 

ALL  -  Every  type  rf  mutation  .ill  he 

considered.  fart  (c)  terminates. 

T 1  T  2  ...  Tn - T 1  ...  Tn  are  code  names  of 

mutant  tyres  (see  Appendix  C).  Mutants  of  the 
listed  tyres  will  be  considered  for  this 
subsequent  PI  AS  runs.  Part  (5)  is  then 

re-executed  by  P I  w'  S  . 

SENSE  T 1  T  2  ...  T  r, - PJVS  displays  which  cf 

the  listed  mutant  types  are  currently  peine 
considered  and  which  are  net.  Fart  (5)  is  then 
re-executed  by  PI^E 

SENSE  —  -  --  PifAS  displays  which  nf  the  possible 
mutant  types  are  currently  heino  considered  and 
which  are  not.  rart  (S)  is  then  ro-executed  by 


P  I  WS 


198 


S  P  l  F  C  T  T1  T?  ...  Tn - 1 f  e  user  specifies  tr 

PI'AS  which  of  the  mutants  hf  wishes  tc  know  -  !  nut  . 

( 6 )  Discia^  and  Out  gut  of  Past  Results 

In  order  to  inform  the  user,  in  non-initial  runs,  t  h  *  t  r  a  r  t 
(6)  has  b e q u ri ,  PI"S  displays  the  fnl  ln>  irq  't'sacp; 

I'FVIFw  nprvious  fu-'  r  r  r  | '  L  T  c 


At  t  hi 

s  point 

the 

US 

°  r  ras  three  rrti 

l  n  «  : 

M  ) 

t  e 

r  b\ 

reouest 

that 

c  e  r  r  a  i  n 

informatics  cm-cerr 

i  r  r 

the 

r  U 

status 

before  this  run 

e  disrlaveo,  (?)  he 

r  py 

r  e  c  u  e 

S  t 

similar 

information 

he 

included  in  his  output 

4  i  l  e  , 

Or 

(  7  ) 

he  may  request  that  she  mutation  phase  o*  r h  l  ke  started. 
Option  three  is  reouested  oy  t>pinr  M.TATf  .  The  other  t  -  o 
options  cause  this  display  tc  !  e  r  e- e  x  e  c  u  1 t  c:  ,  a  c  c  cr  l  i  s  t  e  d 
by  a  repeated  di  sp  lay  i  nn  of  the  "  •<  E  V  1  f  k.  .  .  .  "  "‘sskp.  The 
information  which  can  be  dislayed  or  included  in  the  output 
file  is  the  followinn: 

(a)  Oisplayino  Information 

All  requests  to  display  information  on  the  screen 
teoin  with  the  word  hi  SPLAY.  r,  ext  there  is  a  srace 
followed  by  a  keyword  which  describes  the  infer ration 
tr;  be  put  on  the  screen.  The  keywords  are  tKe 


followina: 


199 


J 


HEAPFR  - The  •  rocrani  subroutine  statement  and 

the  classification  of  the  f  roqrar's  formal 
parameters  are  displayed. 

CORRECTNESS - The  method  of  (i»ter('  ininc  mutant 

correctness  an-*,  possibly,  the  subroutine 
statement  of  the  predicate  subroutine  are 
disr-layed. 

TITLE - The  FT’S  run  title  is  cisrlayed. 

STATUS  ---  The  irutants'  status  before  this  run 
is  di sr  layeri 

(t)  CutfUttino  Inforrr;tior 

All  reouests  to  have  information  incluneri  in  the 
user's  outrut  file  hoqir  with  the  -.vcrd  OUTPUT.  be*t 
there  is  a  space  followed  by  a  keyword  descril ino  rhe 
type  of  information  to  i_e  included  in  the  output 
file.  The  keyword  is  the  followino: 

TfSTCASFS  -  Thr  previous  test  cases  are 

inc  luded. 

TFSTCaSF  n  ---  The  specified  case,  "n"  is 
di so l ayed  . 

Thj  BuSitifiD  P b a 5 5 

There  is  no  user  interaction  durinc  the  mutation  phase.  Jr 


200 


the  event  of  a  fatal  processina  error  the  host  oreratinc 
system  will  issue  appropriate  diagnostic  'S3c>'. 

I h e  P05 1 -run  Pha§g 

The  Post-Pun  Phase  consists  of  one  pert  which  is 
similar  to  part  (h)  (Display  and  Outiut  of  *ast  -esutt*)  of 
the  Pre-Pun  Phase.  It  car  Ip  called  "f isi Iny  and  1 ut| ut  o  < 
New  Besults."  In  oroer  to  in  for-  the  user  that  the  1 ost-cur 
Phase  has  bequn,  P I  er  r  displays  the  fcllc-wiro  message: 

POST-PUN  ph'ASr 

At  this  point  the  user  has  tl r»e  ortions:  he  ray  rpruett 

that  certain  inforeaticn  concerning  the  mutant  results  for 
this  run  and  the  mutant  status  after  t  hi  i  s  run  le  displayed/ 
he  may  reouest  that  similar  information  as  well  as  mutant 
proqram.  listings  he  included  in  his  Outrut  cile,  or  hp  m  =  v 
request  that  the  PIr'S  run  terminate.  The  first  two  ortions 
will  be  described  below.  The  third  option  is  requested  ny 
typino  STOP  and  in  each  cf  the  former  two  ortions  the 
post-Pun  Phase  re-cycles  bv  PI'S  displaying  the  following 
message  : 

POST  PUN  RtStILTS 

The  information  which  can  be  disr.  layed  or  included  in  the 
output  file  is  the  following: 


201 


(1)  Disrlay  of  Information 

All  requests  to  display  information  on  the  screen  Iroin 
v i 1 1  the  word  DISPLAY.  Next  there  is  a  s  n  ?  c  e  followed 
by  a  keyword  descrihino  the  information  to  he  tut  on 
the  screen.  The  keywords  are  the  following: 

(  i )  HEADER  -  Same  as  in  the  Pre-Pun  Phase. 

(ii)  CORRECTNESS  -  $e"e  as  in  the  Pre-Pun  Phase. 

( i  i  i  )  TITLP  -  The  Pit' 5  run  title  is  disrlayeri. 

(  iv)  RESULTS  -  The  mutant  results  for  this  run  are 
ci sr l ayed . 

(v)  STATUS  -  The  mutants'  status  after  this  run  is 
displayed. 

(?)  Outrut  of  Information 

All  reouests  to  have  information  included  i r  the  user's 
output  file  beain  with  the  tor  a  nUTPL|T.  kext  there  is 
a  space  followed  by  a  keyword  describing  the  type  of 
information  to  he  included  in  the  output  file.  The 
keywords  are  the  followina: 

(i)  TESTCASFS  -  The  new  test  cases  are  included. 

(ii)  MUTANTS  -  This  keyword  must  be  followed  by 

additional  keywords  as  follows:  (The  absence  of 

keywords  imolies  the  ALL  keyword.) 


202 


(a)  ALL  -  A  listino  of  all  the  live  mutants  i« 
i  nc  l ud  e d  . 

(b)  RAN  DO  f"  -  A  listiro  of  prp  randomly  s  t  l  e  c  t  e  rt 
live  mutant  of  each  possible  r.utant  ty:e  havinc 
live  mutants  is  included. 

(c)  ALL  T1  T  2  ...  T n  -  A  listiro  of  all  the  live 
mutants  for  each.  of  the  giver  ty;es  is  includes. 

(d)  R  A  N  P  0  f"  T 1  T?  ...  Tr,  -  A  listinc  of  one 
randomly  selected  live  mutant  frr  each  of  the 
given  mutant  types  is  included. 

(e)  hELr  -  RIVS  displays  thp  rode  names  of  the 
mutant  tyres  as  described  in  frprndiy 

The  default  for  all  mutant  types  is  no  listing.  Cnee  a 
user  decides  to  list  a  mutant  tycp,  via  cither  an  AIL 
or  a  RANDOM,  he  canned  later  switch  to  nc  listino  for 
the  type.  However,  he  "ay  switch  from  ALL  to  RfNPP*  or 


fromi  R  A  *.D  0  f  to  ALL  for  any  mutant  type. 


203 


CHAPTER  IV 

IMPLEMENTATION  AND  PORTABILITY  DISCUSSIONS 

IscisfffoiatisQ 

The  PIMS  rroorar  is  written  in  FORTRAN  as  a 
feasibility  study  of  automatic  program  mutations.  Ir  c^fer 
words,  we  addressed  the  cuesficn:  "far  the  concept  of 

croorar  mutation  be  i»r lererte^  in  an  tut orated  system  with 
reasonable  runtime  and  c or; n r a t i ona  l  simr  licitv?"  The  tor 
levels  are  depicted  in  the  following  diagram. 

Dr  i  ver - TRFRUfJ - CLPTT  Y,  C  JFlLE,CCFlLE,CTf'ILf,CvriLC, 

I  DISPLY,FILECr,GETNAK,r.  ETTYr,LTINrO, 

I  RF  C  S  ,  N  F  w  T  S  T  ,WP  A  ST  ,  l-'r  S  T  A  T 

I 

PHASE - CLPTTY,hI  SPLY,vFR(itD,t'F  KGf  l'‘,  X\T  Wf-U, 

I  YOLDMU 

I 

--F0STRN - CLP  TTY, CL  F*UP,n  SF’L  Y  ,  v  T  W&  E  S  ,  wV  E  w 

Beyond  these  levels,  the  control  paths  are  relatively 
difficult  to  analyze  from  a  raintenanre  rroorar mer’s  point 
of  view.  The  system  data  structures  are  almost  entirely 
parametric  and  allocation  is  contained  in  multiple  Cf',vCK 
blocks.  The  large  number  of  these  blocks  ard  their 
extensive  use  rermits  many  side-effects  to  take  place  as  the 
result  of  procedures  invoked  at  every  level.  These  sioe 


effects  alsc  greatly  complicate  the  issues  concrrmnc  the 
scope-of-control  of  procedures  over  their  variables. 


204 


PIMS  executes  in  a  paged  er.  v  i  ror.r  prt  as  a  sinolf 
image  with  about  (7<  bytes  cf  address  m  aca  reouired  for 
both  the  data  and  the  executable  code.  Since  the  yrmr?  n  is 
l  o  a  i  c  a  l  l  y  divided  into  t^rec  distinct  chases,  *  he  address 
space  could  he  re^ucen  xith  little  impact  on  cxenitinr  or 
operation  by  implementation  the  task  as  overlays  or 
separately  executed  proarams.  This  acr  reach  is  recorrerded 
for  i  rr  r  l  ement  i  no  the  PI  m  S  t  ruqran.  on  most  r  ieief"  |  ut  ers. 

Portability 

During  the  implementation  of  FIDS  or  thp  PRI^f-AfT, 
much  co l l a l o r at i on  took  r lace  between  the  research  orou[c  at 
Yale  University  and  at  ijforoia  Tech.  *v  effort-  used  o 
sixteen  - 1,  it  machine,  the  P  R  I  "  f  -  4  f  P  ,  w  h  i  I  e>  the  Yale  effort 
used  a  36-bit  machine,  the  P E f system- 1 0.  The  only  rpbur 
available  for  transporting  rroarams  and  data  between  these 
two  systems  was  nine-track  magnetic  tape. 

Although  both  venoors  claimec  tc  suptort  AM  SI 
compatible  magnetic  tares,  files  coulo  rot  te  writter  hy  one 
system  and  read  by  the  other  without  some  forr  of 
intermediate  processino.  A  list  of  this  processino 

i nc l ud es  : 

1)  Records  which  were  written  with  f  r  characters  per 
record  and  one  record  rtr  flock  used  different  rethoos 


ol  indicating  the  end-of-record.  c-r.  ecificalty,  PEC 
wrote  an  ^P-ctiaracter  record  with  two  trailing  noils 
(binary  zeroes)  while  P  F  I  r*  F  expected  an  'C-cuaracter 
record  which  included  the  two  nulls. 

? )  Disk  files  with  erledoed  carriage-return/ l ine-feeo 
sequences  caused  general  havoc  on  both  systers.  the 
line-feeds  usually  had  to  te  removed  before  any 
progress  was  wade  in  processing  the  files. 

A  second  and  laroer  set  of  prcblers  was  enc  ci;nt  e  red 
when  FORTRAN  source  files  were  ireved  between  the  two 
systems.  Obvious  problems  developed  as  a  result  of  the 
differino  word  lengths  and  associated  inteoer  ima  or  i  t  ude  . 
The  impact  of  many  of  these  j  rohlers  was  lessened  by  the 
PRIPE  FORTRAN  Declaration  for  long-intecers,  I  T  F  r,  £  h  *  L  , 
which  specifies  a  32-bit  integer.  In  order  to  [ erf  err  the 
same  functions  on  divers  computers,  it  would  be  necessary  to 
constrain  all  i rr l eme nt a t i ons .  A  discussion  of  those 
constraints  is  presented  below. 

First,  all  integer  Quantities  should  be  kept  .vthir 
the  range  -32,767  to  ♦32,7x7.  This  would  allow  the  system 
to  function  on  sixteen-bit  machines  that  do  not  provide  wny 
long  integer  forms. 

Second,  the  packing  of  multiple  fields  of  data  per 
integer  variable  should  be  avoided.  This  racking  is  also 


206 


inefficient  for  the  large  word  nacMnes,  hut  trere  arc 
severe  unpacking  problems  for  the  sixteer-bit  machines.  In 
our  case,  a  ? 6 - h i t  word  was  used  or  t r r  FFfsysten-1'  to 
contain  two  nine-bit  and  one  eiohteen-i.it  fields.  »e  were 
able  to  implement  this  using  a  lone  integer  am  obtain  two 
eiaht-bit  and  one  sixteen-fcit  fields.  This  loss  m 
m.aonitude  has  not  caused  problems  to  date.  f  xt  anted  ranoe 
can  be  obtained  by  segmentin'’  the  use  of  the  system  to 
process  smaller  procrams. 

Third,  the  character  processing  that  is  done  in  the 
compiler  and  command  processor  should  te  done  cither  with 
integer  tokens  subject  tc  the  first  constraint.  If  integer 
tokens  are  not  desirable,  then  at  least  characters  should  be 
processed  in  FORTRAN  M  format.  The  A1  procession  will 
decrease  the  efficiency  for  the  large  V'orri  machines  acain, 
but  the  character  routines  will  be  portable.  t  cood  machine 
independent  "string  subroutine"  package  would  probably  he  a 
better  choice  here. 

Fourth,  vender  surplied  features  and  all  1 fC  should 
be  imbedded  in  user  writter  crocedurrs.  Ir  some  cases,  this 
will  merely  add  a  layer  of  run-time  linkaoe  with  th 
parameters  to  the  user  routine  being  parsed  directly  tc  the 
vendor  feature  or  I/C  routine.  However,  this  layer  allows 
other  system  routines  to  te  substituted  ano  code  aooed  to 
provided  for  the  behavior  cf  another  techninue.  b'odifyinc  a 
single  imbedded  routine  is  muc*  easier  that  sr»rc*  h.o  fnr 


r 


207 


all  o f  the  uses  of  a  specific  steterent  thrnuohout  an  ertire 
system  implementation. 

We  believe  that  the  above  techriaues  should  t (  used 
in  future  implementation  efforts.  They  were  not  usee!  in  our 
system  but  the  benefits  of  these  technirues  hecare  arrarent 
as  we  tried  to  n ass  more  and  rare  F 0F1PA\  source  dees 
between  machines. 


208 


CHAPTER  V 

SUMMARY 

A  prograr  rotation  system  built  ?nr  is  n  o  t 
operational  on  a  P  R I  f1'  f  -  4  0 1  minicomputer.  The  user  mas 
specify  an  input  data  file  that  contains  £  subroutine  ►hi  cl 
is  valid  in  a  certair  subset  of  the  F  0  u  T  S  A : .  [  roi  rar'inr; 
lanouaae.  This  subroutine  is  parsed,  interpreted  with  user 
specified  test  data  and  the  user  is  oiver  tkr  opportunity  of 
determining  the  correctness  of  this  test  data  either 
manually  or  through  the  use  of  a  predicate  subroutine  that 
will  determine  the  correctness  cf  this  base  routine. 

Once  the  user  thinEs  he  has  an  -reouste  test  data 
set,  this  base  rrooram  is  modified  in  several  ways  and 
executed  again  after  each  modification.  These  modifications 
are  called  mutants  and  each  mutant  will  either  survive  or 
die  during  its  execution.  All  mutants  t  *■  ?  t  iroduce 
incorrect  results  or  will  not  be  valid  subset  proorams  will 
die.  Those  mutants  that  produce  correct  results  will 
indicate  to  the  user  that  further  analysis  is  needed. 

When  further  analysis  is  necessary  the  user  "l .  * 
determine  that  either  a  live  mutant  is  eouiva lent  with  otter 
mutants  and  discard  the  eouivalent  mutants  manually,  or  a 
live  mutant  might  be  eliminated  by  auomentino  the  test  data 
set.  The  test  data  set  is  then  modifiec  as  reauired  and  the 
mutants  that  remain  live  are  executed  again.  Each  time,  the 


rrogram  will  report  various  statistics  a  tout  the  live 
mutants  rerai  ninp.  ;Ufi  the  user  is  satisfiec  w-ith 
completeness  of  data  achiever,  the  process  stors.  We  rov 
say  that  an  acceptatle  level  of  test  data  aoenoacy  las  teer 
reached.  In  (tore  Quantitative  terms,  sore  percer.taof  of  the 
total  mutants  will  remain  live  at  this  point.  for  tests  of 
the  same  subroutine,  we  interpret  this  percentage  to  mean. 
The  test  data  that  shows  the  lowest  number  of  live  mutants 
at  the  time  of  comparison,  has  the  most  adeouate  test  data. 
We  use  this  measure  of  test  data  adenuacy  tf  infer  that  the 
subroutine  with  the  more  adeauate  test  date  has  beer  tested 
more  thorouohly.  In  addition,  the  subroutine  that  Kes  been 
tested  mere  thoroughly  is  more  probably  the  most  correct. 

£onc lusfgn 

Prooram  Mutation  is  a  valuable  asset  in  program 
testing.  The  methocolocy  o  r  e  a  t  l  y  reduces  the  time  and 
effort  reouired  to  find  errors  in  these  procrams  stuoied 
thus  far.  Although  there  is  a  wealth  of  FCfTPA'.  software  in 
the  world  today,  it  is  difficult  to  obtain  and  modify 
real-world  software  for  anaylsis.  This  rrotlem  is  the  toric 
of  current  research. 

The  PJMS  system  discussed  above  was  not  designed  with 
the  property  of  such  tonics  as  portability  and 
maintainability.  1  would  litre  to  suggest  that  tlese  topics 


210 


are  suitable  for  investioation  in  t hr  i  r  cwr  rioht.  The  idea 
of  o  rooram  (nutation  is  currently  l  nine  extender:  to  full 
ArSI-196Q  FORTRAN,  C^PCL,  and  RASCAL  with  the  here  that  ty 
examinino  the  effects  of  *-he  retfndclooy  in  several 
nrograrmino  larouaaes,  some  insiqht  ray  [ e  obtained  into  the 
methodologies  of  prooran  testinc  in  uemral. 


211 


APPFNDIX  A 

ERROR  MESSAGES 

This  'iocunjnt  describes  the  errors  detected  ty  PINS 
durino  the  interactive  fhases  of  a  F1*S  run,  the  messages 
displayed  by  P J " S  on  detection  of  an  error,  and  the  actions 
take"  by  Pirns  after  finding  ar  error.  T  h.  e  errors  are 

divided  into  two  classes:  fatal  and  non-fat al,  with  fatal 
errors  resultina  in  an  abort  of  the  rlf'S  run.  fatal  errors 
only  occur  durino  the  Pre-Fur  F  m  a  s  e  of  f  I  ^  S  .  The  occurence 
of  a  fatal  error  or  the  user  erterinr  rlLL  durino  the 
Pre-Pur.  Fhase  of  F 1  leaves  all  transient  files  as  they 
were  before  the  f  I  *'  S  run  b  e  q  a  n  .  hr. ce  the  user  issues  a  GO 
cor.rrand,  thus  sionallino  the  end  of  the  Fre-^un  Phase,  he 
will  not  be  able  to  issue  a  K  J  L  L . 

Ue  also  arouc  the  errors  into  those  which  occur 

durinc  parsing  the  program  and  those  which  occur  strictly  as 
bad  resoonses  to  r  r  o  m.  p  t  s  marie  ty  PIP  5.  Parser  errors  always 

are  fatal  and  the  P 1 M S  parser  is  oesiared  tc  abort  or 

detection  of  a  first  error.  That  is,  if  the  p  r  o  o  r  a  m  has 

multiple  syntactic  errors,  the  P  I  *  S  parser  will  print  ar 

error  message  for  only  the  first  of  them. 

NOTE  : 

The  user  should  never  end  a  F  I  m  0  rur  by  a  CTPL-P. 

The  CTRl-F  termination  will  cause  further  P I  to S  run s  to 

perform  unp r ed i c t ab l y . 


mj 


212 


Parser  Error  £S5§agg:5 


The  ^essaoes  which  the  user  ray  encounter  duriro  the 
r arsine  of  either  the  routine  which  is  Oeino  testec  or  cf 
the  predicate  subroutine  are  very  simil  r  to  those  rerereter 
t-y  3ry  FOP  IK  A'-  con  oiler.  Since  a  knc-wleroe  of  F  (  F  T 1  A  \  is 
rrereauisite  to  a  sennincful  use  of  the  f  !sc  system,  a  r 
understandino  of  typical  corri  l*r  di-rricstics  is  assurer. 

Ore  asrect  of  the  F  I  ' c  ciidum-stics  that  is  different 
f ro"  that  of  a  tvoical  c  on  r  i  t  e  r  is  the  fatal  nature  of 
corriler  diagnostics.  In  the  event  that  any  corr 1 le-tu'e 
error  is  encountered  within  *  nodule,  the  error  is  rercrted 
at  that  point  and  the  coorile  is  ahorted.  Ttere  ic  nr 
attempt  at  compile-time  error  trace-tack  or  recovery.  Jf 
ary  errors  are  reported,  the  user  is  advised  to  scar  t  r  e 
remainder  of  the  routine  manually  for  ether  syr  tax  errors 
[i  r  i  o  r  tc  a  resuh  mission  to  FINS.  This  ranual  scar  should 
save  mar  and  machine  time  curino  the  Fre-run  Phase. 


213 


lDt£ti£tiye  esc  tssssats 


Pre-Ryn  Phasg 

(  a )  1 1)  f  frog  T  ST  fame 

(1)  Message:  ILLEGAL  fILf  NAPE 
Action:  f ereat  cart  (a). 


(?)  "essaae:  N  0  V  -  f  A  IF  T  E  T  P  A  u  F-  R  p  C-  R  A  M  FIIF 
Action:  repeat  part  (a). 

(  ? )  "  e  s  s  a  g  e  :  FILE  N  A  *  f  C  G  N  F  L  I  C  T  -  OUTFIT  FILE 

ALREADY  EXISTS 

Action:  tone.  Server,  as  a  warning. 

( b )  It) £  Run  IXC£ 

(1)  "essaoe:  ILLFG/L  REPIY 
Action:  Keneet  rart  ( fc ) . 


(?)  Passage:  PROGRA*  NOT  IN  THf  FI"S  FORTRAN  Sl'bSFT 

Action;  Abort 


(?)  Message:  THF  (01.  LOWING  TRANSIENT  F I L  r  r  ART 

HISSING  : 

Action:  Abort 


(4)  Aassaoe:  T  HR  FOLLOWING  TRANSIENT  FIIFS  A  L  P  r  A  D  Y 
EXIST  : 

Action:  Abort 


(c)  program  §29  Jest  Q  a  s  e  £ 

(1)  Message:  ILLFGAL  CLASS! FI CATIOt 

Action:  (A)  Display  the  teoal  classification  codes. 

(P)  Depeat  port  (c-1)  or  the  safe  rara rater. 


(2)  Message:  ILLEGAL  R  r  P  L  Y 

Action:  Repeat  Frogr?m  and  Test  Cares 


T  I 


214 


)  *  d  r. 


( i )  tod 


(3)  ve  s  s  a  ge  :  PPEMf  ATI  SU'  HiiiTlM  fill  f'f»T.  NOT 

r  x  i  s  T 

Action:  Abort 


(A)  *'  t  s  s  3  Q  f  :  T<  A  D  >  R  F  0  T  C  A  T  F  S  l1 1 '  L  C 1 1  T  1  *  F  C  •*  L  i  I  \  0 

SF  QUENCF 

Action:  (A)  Display  the  i  roorsr  's  f  c  r  r  ?  I  rare "  e  t  e  r  « 

and  their  classifications. 

( o  )  D  i  S  n  l  a  >  the  treoicate  sut  routine 

statement. 

( C )  Abort 

(  c  )  "essaoe:  ILLEGAL  VALUE 

Action:  Repeat  the  reouest  for  cata  on  the  sane 

input  forrral  rara»pter(s).  U'tr's  irrut  icorreP. 


ft)  v  e  s  s  a  c  e :  NOT  ENOUGH  DATA  S  U  f  P  l  I  F  D 

Action;  tpreat  the  reouest  for  data  on  the  sane 
input  formal  ra rane ter ( s ) .  User's  irrut  ignored. 


I e  s  ^  Eases 

(1)  "essaae:  ILLEGAL  VALUE 

Action:  Rereat  the  reouest  for  data  or  the  sa»p 

irrut  forrral  r  ararpter  (s)  .  User's  irrut  ionored. 


(?)  A'essaoe:  NOT  ENOUGH  OAT  A  Sl'EFLIED 

Action:  Repeat  the  reouest  for  gets  or  the  s  a e 

in out  formal  rarampterCs).  User's  in rut  icocrpr. 


i t i on  of  and  f.  t  a  t  u  f  of  lytant  I y  £  e  s  Ccnsioereri 


(1)  "essaae  : 
Action:  (A) 

(0  ) 


ILLEGAL  R  F  PL  Y 

Pisolay  all  legal  rerlies. 
Fereat  cart  (e). 


(?)  Message:  ILLEGAL  MUTANT  TYPE 

Action:  (A)  Display  the  codec!  names  of  the  rutant 

t yp  e  s  . 

(R)  I-  era  at  rart  (e). 


(3)  Message:  These  MUTANT  T  Y  P  E  P  *  F  R  F  ALREADY  ON: 

Action:  None.  Serves  as  a  *  a  mi  no.  The  other 


215 


si  ec  i  fieci  irntant  types  which  were  eft  are  new  or 


(f)  h  is  claying  and  OLfiCytJiOG  -  -  -  2  5 1  5!  §  s  u  i  t  s 
(1)  Vessaae :  ILLEGAL  REQUEST 

Action:  (A)  Display  the  leoal  renuests  for  cart  ft) 

( R )  R ere  at  part  (f). 


(c)  general  Ec  rors 


(1)  t'essaae: 
Action:  (A) 

(P  ) 
<C) 
(  D) 


P  P  n  G  R  A  v  FAILS 

Display  the  test  case  cn  which  it  tails 
fiso lay  the  way  in  which  it  failed. 

Put  (A)  and  ( R  >  in  the  output  file. 
Abort 


( ?)  ressaqe : 
Action:  (A) 

( P ) 
(C) 
ID) 


PREDICATE  SUPROl'Tm  FAIL' 
fisclay  the  test  case  or  which  it  fails. 
Display  the  way  in  which  it  feils. 

Put  (A)  end  (R)  in  the  output  file. 

Abort 


(i)  "essage: 
CONTINUE  . 
Action:  Abort 

CONTINUE  . 


OUTPUT  FILE  EXISTS  -  TYPF  PILL  C  i- 
on  PILL,  delete  output  file  on 


(O  dessaoe:  TOC  KUCh  hATA  OR  FAULTY  S  Y  t TAX 

Action:  Rereat  the  i  revious  rrert  for 

i  n  p  u  t  . 


nurf ric 


(S)  Ressaae :  ILLEGAL  VALUE 

Action:  Peoeat  the  previous  promt 

i nput  . 


for  nureric 


216 


lbs  PfisJrBwD  Pfcasc 


(.a)  Message:  ILLEGAL  REQl'FSI 

Action:  (A)  D i s  c  l a  >  the  teas  l  requests  tor  the 

Pc st-Run  Phase. 

(P)  Repeat  the  Fost-Fur  Fhase. 


(b) 


Message  : 
Action: 


ILLFTAL  IJUTA^I  TYPF. 

(  A  >  C>  i  S  r  l  a  y  alt  least  rrtsnt 
(p)  ‘■eoeat  the  Fost-Kur  Phase 


types  . 


217 


APPENDIX  B 

FORTRAN  LANGUAGE  SUBSET 

This  appendix  describes  the  frPTPhh  subset  larnueae 
whose  programs  can  be  tested  usino  the  Pilot  Mutation 
System.  Only  the  syntax  of  this  subset,  specified  m  an 
extended  RVF  (see  below),  is  given.  The  syrtax  rresrnteri  is 
in  a  "pure"  form  with  the  mundane  aspects  of  fO^T k  /  f.  syntax 
assumed.  These  include  the  following:  1)  statements  start 
on  a  new  line  and  arrear  in  "card"  columns  7-7?,  2)  rrlurr  t 
is  the  statement  continuation  column,  ?)  statement  labels 
appear  in  columns  1-c,  A)  na^es  have  lengths  of  6  or  less 
characters,  and  5)  comment  statements  have  a  C  in  cc.lunr  1. 

The  PIWS  FORTM*  subset  has  the  fcllowinc  two 
semantical  restrictions:  ( 1 )  all  variables  must  be 
declared,  and  (?)  keywords,  sucl  as  TO  and  E N h ,  cannct  be 
used  as  variable  names. 


Languagg  Subset  Qvervigy 

The  subset  of  the  F  CRT  R  Ah'  lanruane  choser,  <cr  this 
implementation  of  F  I  *  S  is  such  that  I  NT  t C E  R  processing  of 
numeric  data  is  possible.  a  prooram  must  be  a  SUTPCUTIM 
subprogram  with  an  ootional  parameter  list.  Farametrrs  and 
other  variables  laust  be  declared  using  I N  T  r  ?■  F  R  or  ri*rv'I0x 
declarations.  Arravs  may  h e  either  ore  or  two  dimensional 


and  nay  tie  specified  in  the  IN Ttf-fR  statement. 

The  acceptable  control  structure.-  i  nclure  the 
looical-lf,  r.OTO,  nested-00,  rot  T 1  NUT.,  ard 

£rithretic  expressions  ray  include  ary  of  tie  r; erat'rs:  +, 

*  ,  /  or  **.  Logical  exrressior.  s  are  restricted  to  the  ]  f 

staterent  and  must  be  one  of:  •  f  ">  l  •  *  •  l1  R  .  ,  or  . '  ^  T  .  as 

used  in  -any  FORTRAN  systems.  Numeric  values  should  be  vert 
-ithin  the  ranqe  -3 2,7*7  to  *3?,? t-7  oue  to  the  nature  of  the 
P  R I  v  l  -  4  CT  sixteen  hit  architecture. 

BN£  2es££iBtl2D  Si  tb£  kiDSeSSS  Subset 

Standaro  H  N  F  is  auo rented  with  the  folio wine  f  o  u  r 

abt  re v i a  t i ons  : 

(1)  list  appendix  -  <y>  <x-list>  is  ecuivalent  to 

<>>  ::=  <x>  |  <x>  <y> 

(2)  comma l i st  appendix  -  <y>  ::=  < x - c c n a l i s t >  is 

eo divalent  to 

<y>  ;;=  <X>  |  <X>  ,  <V> 

(3)  option  -  < y >  ::=  <x>  [  <  Z  >  }  is  ecuivalent  to 

< y >  ::=  <x>  |  <x>  <  z  > 

14)  choice  -  <y  >  ::=  <x  >  {  <*>  |  <z>)  is  eouivalent  to 


Progress 

<crograr.>  ::=  SUBROUTINE  <r  rogr  air-rare  >  ( 

< f or ma l -a r gun ent -comn a l i s t >  ) 

<declarat ion-state ment-list> 

<executable-statement-li st> 

END 

f 2£0*ii  ftcgWSfiQtS 

<forira  l -a  rgument>  <variaf  le-rane> 

ge£  lacat ion  Statements 

<d ec l a r a t i on-st aten »rt >  INTEGER  <dec  Larat ion-corn a l i st  > 

<dec  taration>  < s C a l a r- do c l >  I  <  a  r  r  a  > -d  e  c  I  > 

<sca  lar-dec l>  ::=  <variable-nane> 

<array-decl>  <one-di u -array-dec l >  I  <  t  *  c.-d  i  r  -  a  r  r  ay -d  ec  l  > 

<one-dim-array-dec  t>  <  v  a  r  i  a  h  t  e -narre>  (  <l-!nit>  ) 

<  l  i  m  i  t  >  ::=  <rositive-inteoer>  |  <voriaMe-rame> 

<t*o- dim -array-dec  l  >  ::=  <va  r  i  ab  l  e-03me>  <  <lint-rau>  ) 

<limit-cair>  ::=  <limit>  ,  <lim;t> 

EiegytJbif  SiaifiiSOtS 

<e xe cut  ah  le- st a t ement >  ::=  f<lahrl>T  <5fatcrfnf> 


220 


<labet>  ::=  <po  s  i  t  i  v  **- i  n  t  e  c.e  r  > 

<s  t  a  t  erren  t  >  ::=  <  s  i  mr  l  e  -  s  t  a  t  r  ire  n  t  >  |  <  c  or  d  i  t  i  ora  l  -  s  t  a  t  c  n  e  n  t  > 

j  <do- l  ooo - s t  a  t  e rr  en t  > 

Si ?cie  Statements 

<  s  i  me- 1  e-s  t  at  ement  >  ::  =  <octo-s  tatfrfnt  >  I 

<assior>r,  ent-stateifent  > 

<continue-statement>  |  < re  turn-  state  rrpnt> 

<c ot o- s t a t emen t >  (GO  TO  |  GOTO}  <l?tel> 

<a s S i anmen t - s t a t emen t >  <rfterfncf >  - 

<srithmetic-e*rre5?ion> 

<conti  rue-state  if  nt>  ::=  CONTINUE 

(return-statement)  ::=  R  F  T  U  C  ' 

Conditional  Statements 

<  c  onrti  t  i  ona  l -s  t  a  t  e  rent  >  If  (  <  U  o  i  c  a  l -  e  *  r  r  e  s  s  i  on  >  ) 

<simple“statement> 

CQrlooB  SlsJgfSOtS 

<rto*loor-statement>  ::=  < i noe x - p a r t > 

<outer-loor~body> 

<loof.  -  end> 

(index-part)  ::=  <  l  a  ^  e  l  >  <inde*>  =  <  initial)  ,  (temiral) 


I  ,  <increment>J 


<outer- Looc~body>  ::=  <out e r- loop-s t a t en e nt- 1  i s t > 

<->uter-lcor-stattfn,ent  >  ::=  C<label>D 
{  < s  i  n>n  l  e-st  a  t  en-ent  >  | 

<  c  ond  i  t  i  ona  l  -  s  t  a  t  e  r  e  r.  t  >  | 

<inr. er-do-loor  >  } 

<irner-c*o-looc>  :  :  =  <ir>dex-part> 

< l oor** orty > 
t  <  tcof -end>] 

<lnor-hody>  ::=  <  l  oor-s t at er pnt - l i st > 

<  l  oor  -  s  t  a  t  errent  >  ::=  C<label>] 

{<sirrpie-statement>  j  <conditional-stete(*ert>) 

<toor-enij>  :  :  =  <  l  a  b  e  l  >  <(ocr-ericl-sTatp't'ent> 

<  l  oop-e  nd-s  t  a  t  e^e  n  t  >  ::=  <  c  o^t  i  nut  -  s  t  a  t  e  it'e  nt  >  | 

<assiqnnient-5taterr>pnt>  | 

<conditional-staterr-ent> 

<  i  n  d  e  x  >  ::=  <scalar-reference> 

<initial>  ::=  <scatar-reference>  |  <rositivp-integpr> 
<tenrinal>  ::=  <scalar-reference>  |  <pcsitive-inteaer> 
<incrprrpnt>  ::=  <scalar-rpference>  |  <pcritive-inteper> 


Aritbmstis  &5CC5S51CQS 

<srithffetic-expressicn>  ::=  f<irithnetic-e*i rersior>  {+  | 

-)  ]  <  a  e  3  > 

<ae3>  ::=  f<ae3>  {*  I  /}}  <ar?> 

<ae?>  ::=  f<ae?>  *  *  T  <  a<>  1  > 

<  a  a*  1  >  ::=  <priiitive-af>  |  -  <  a  e  1  >  |  ( 

<arirhrretic~exrression>  ) 

< o r i T i t i v e-a e >  ::=  <reference>  |  <irtecer> 

Lssitsi  EkqecssI&os 

<loai  cat-ex  r-ression>  :  :  =  [<  l  ^ai  cs  l -f  *f  rf  i  on>  ,f  li.1  <  l  e  ?  > 

<  l  e  ?>  ::=  L  <  l  e  ?  >  .  ^  r  -  3  <  l  e  ’  > 

<  L  e  1  >  ::=  <rri''itive-t.€>  |  .'OT.  <  l  e  1  >  |  ( 

< l oa i c a l -e >r r e s s i rn >  ) 

<  p  r  i  rr.  itive-le>  <arittrretic-exf  Tessin-- >  <rf  tat  iona  l-op> 

<arithr. ft  ic*exnre ssion> 

<re lat i ona l -op >  ::=  .IT.  |  .IF.  |  .Ft.  |  .VF.  I  .GT.  | 

.Gr . 


BsiSEsnsss 


>nce  > 


<5Calir-rf ter(ncf>  |  <array-or-e~referer>ce>  | 


<arra>-txo-referenct> 


223 


<s  c  a  l  a  r -r  ef  e  r  e  r>c  e  >  ::=  <va  r  i  ab  l  e-naife> 

<array-one-refer(nce>  ::=  <variablf-nare>  (  <sim  t  e  -  a  e  >  ) 

<array-two-reference>  ::=  <varialle_nar-e>  (  , 

<s i mp l e-a  e>  ) 

<simple-ae>  C  -  3  L<no?iti\'e-irteoer>  *3 

<sc a  la r-ref erence>  {  +  |  -> 

<pos i t i ve- i nt ege r>  | 

[-3  <scalar-re1erer.ce>  | 

<positive-intfger> 

IdSDtjfisE  Namfis 

<croqram-name>  ::=  <  r  air  e> 

<  va  r  i  a  b  l  e -na  rre  >  <narre> 

<name>  ::=  <letter>  [  <  a  l  p  h  a'"  <*  r  i  c  -  l  i  st  >  3 

<  l  e  1 1  e  r  >  :  :  =  A  |  C  |  C  |  D  |  F  |  F  |  C  |  H  |  I  |  J  |  r  j  L  | 

l*,|N|C|F-|G|f:|S|T|U|V|V|v|Y|Z 

<alphai»er  ic>  ::=  <letter>  |  <dinit> 

<digit>  ::=  <zero>  |  < pos i t i ve -d i g i t> 

< if ro>  : : =  0 

<pos  i  t  i  ve-di  a  i  t>  ::=  1|?|^|4|f;|6|7|r|o 


224 


Constants 

<constant>  ::=  <intfoer> 

<inteoer>  <rositive-inter>er>  |  <7ero-li  st>  I  - 

<r>ositive-inteaer> 

<positivf-inteqer>  ::=  <r  osi t ive-dici t>  C <a i a i t - li s t >  ] 


n 

”1 


“1 

-1 

1 

1 


"1 


225 


APPENDIX  C 

COMMANDS  AND  ABBREVIATIONS 

This  appendix  describes  t He  commands  arr  their 
abbreviations  that  are  used  to  communicate  with  F I v  c  curinn 
the  interactive  chases.  The  commands  tor  specityinc  mutant 
types  follow. 

The  user  specifies  mutant  types  to  PJ'*S  hy  usino  the 
followinn  three  character  abbreviations.  tr  abbreviation 
marked  *  means  the  mutant  tyte  is  not  currently  implemented. 
There  are  no  "full  word"  commands  for  specifying  mutant 
tyres . 

(a)  Fata  Declaration  Kutatior  s 

(i)  ALD  -  Array  Limit  Default  Insertion 

(ii)  A  L  R  -  Two  rimensioral  Array  Limit  r  f  ri’utat  ion 

(b)  Data  Reference  vutatiorp 

(i)  CRP  k  -  Constarit  Ke placement.  The  value  k>  =  1  cues 
the  neighborhood  (i.e.,  c  ♦/-  k  )  r f  the  replacino 
constants.  The  user  may  choose  ret  to  specify  k,  ip 
which  case  a  default  value  of  k=1  is  assumed  (see 
Append i x  D ) . 

(ii)  SVP  -  Scalar  Variable  Replacement 

(iii)  SFC  -  Scalar  Variablf  for  Constant  Replacements 


226 


(  i  v )  C  F  S  -  Constant  For  fcstar  Var  i  at  U  Tetlacerent 

(v)  CA°  -  CoFridraMP  Array  Neirp  Rerlacerent 

(  v  i  )  C  F  A  -  Constant  ter  Array  Cpfp  re nef  F  pi  I  acf' er  t 

(  v  i  i  )  S  F  A  -  Scalar  Variable  tor  Array  FMfrence 
Apr l atf»pnt 

(  v  i  i  i  )  AFC  -  Array  Keterence  tor  trrst  art  Ft-i  Ipre’rrt 

( i  x  )  A  F  S  -  Array  Keterence  tor  Scalar  V  aria  tie 
Rep lacenent 

(x)  A  I  P  -  Two  Pi  mens  i  on?  1  Array  Index  r-  e  r  r  u  t  a  t  i  o  n 

*  (  x  i  )  S  V  I  k  -  Scalar  Variable  Initialization  Insertion. 
The  value  >0  civet  the  i  +  k  set  ct  initielizino  values. 
The  user  ray  choose  not  to  srecity  k,  in  w  h  i  c  h  case 
detault  value  o t  k  =  f  is  assured. 

(c)  Operator  Evaluation  hutatiors 


( i )  ACR  - 

Arithretic  Operator 

Rep l r ceoent 

( i  i ) 

ROR  - 

Relational  0:erator 

Ren  1  ace  rent 

( i  i  i  ) 

L  CR 

-  Logical  Connector 

Rerlacerent 

*  ( i  v) 

APR 

-  Arithretic  'recede  rice  Per  rotation 

*<  v) 

L  P  P  - 

Logical  1  r'croencf 

rprirutation 

(a)  Control  rutoti ons 


(  i  )  GLfi  - 

(ii)  PAG 

( i  i  i )  C  SI 

(iv)  C  S  D  • 

* ( V)  1LC  - 

*  (  v  i  )  n  I  A 


Goto  Label  Rer I acere nt 
Path  Analysis 

-  Continue  Statement  Insertion 
Continue  Statement  Deletion 
Inner  Do  Loot  Decoupling 

-  Do  Loot  Inbex  Alteration 


C  v  i  i  )  P  Sft 


Return  Statement  Re  laceir-ert 


228 


APPFNDIX  D 

DESCRIPTION  OF  THE  MUTATIONS  PERFORMED 

This  appendiy  describes  the  typer  of  First  order 
recitations  which  the  r  i  let  Mutation  System  considers  and  some 
other  mutations,  marked  with  a  »hich  nay  be  consideren 

in  future  extensions  of  P  T  S .  The  wording  used  is  tied  to 
the  syntactic  categories  defined  in  the  "FORTRAN  Laron age 
Subset"  (see  Appendix  D).  rnly  those  proorams  which  dr»  ir 
the  subset  language  are  considered  tn  be  valid  nutations. 

52X2  5S£i2E2t2SD  Mutations 

Array  Limit  Default  Insertion  is  accomplished  by 
replacina  each  scalar  reference  in  an  array  declaration  h y 

1  . 

Two  dimensional  Array  Limit  Fpr  mutation  is 
accomplished  by  exchangin'-  each  two  dimensional  array 
declaration  limit  rair. 

t a  Reference  Mutations 

The  following  sets  are  referenced  in  defininc  the 
mutation  operations  in  this  section: 

A  -  set  of  all  array  references  arpearino  in  the 


program 


229 


C  -  set  of  alt  constants  arptutirr  in  executable 

statements  of  the  proaram. 

f - the  set  {-k,-k  +  1,...,-1,(',1/...,k-1,k)  where  k  >  C 

is  supplied  by  the  user 

c  -  set  of  all  scalar  variable  names  appearinc  in 

executable  statements  o*  the  nrooram 

VI  ---  set  of  all  one  dimensional  array  names  arpearmo 
in  executable  state  m.ents  of  the  pro n ran. 

V2  ---  set  of  all  t»o  dimensional  array  races  anpearino 
in  executable  statements  of  the  nrnorar. 

1.  Constant  Replacement 

Each  constant  c  anpearino  in  any  executar le  statement  is 
replaced  by  members  of  the  set 

{c-k,c-k+1,...,c-1,c+1,...,c*k}.  If  k  =  0  is  sufliac,  then 
no  constant  replacements  are  produced  ty  PI'S. 

2.  Scalar  Variable  Replacement 

Each  scalar  variable  s  arpearino  in  any  executable  statement 
is  replaced  by  members  of  S-<s)  . 

7 .  Scalar  Variable  for  Constant  Replacement 

Each  constant  e  appeariria  in  any  executable  statement  is 
replaced  by  (members  of  S 


230 


4.  Constant  for  Scalar  Varinhle  Replacement 

f  sc  h  scalar  variable  s  appearing  in  any  executitle  statement 
is  replaced  by  members  of  C. 

c.  Conparible  Array  Name  R  e  t  l  a  c  *»  m  m. ent 

Fach  instance  of  vl  in  VI  aptearinn  in  any  executable 


statement 

i  s 

replaced 

by 

members 

of  V'1-{v1). 

Fact  i r  s  t 

of  v ?  i  r. 

V? 

appearina 

in  a  n> 

executable 

statement 

rep l aceo  by 

members  of 

V?- 

<v?)  . 

6.  Constant  for  Array  Reference  Replacement 

Fach  instance  of  ar  in  a  atrearinr  in  am  executable 

statement  is  replaceo  by  nemters  of  C. 

7.  Scalar  Variable  for  Array  Reference  Replacement 

Fach  instance  of  ar  in  A  appearing  in  &  n  exectahle  statement 
is  replaced  by  members  of  S. 

8.  Array  Reference  for  Constant  Replacemment 

Fach  instance  of  c  in  c  appearinrj  in  any  executatle 

state  it.  ent  is  replaced  by  members  of  A 

9.  Array  Reference  for  Scalar  Variable  Replacement 

Fach  instance  of  s  in  S  apneariro  in  any  executable 

statement  is  replaced  by  menu  ers  of  A 


10.  Two  Dimensional  Array  Index  Per nutation 

Each  instance  of  references  to  two  dimensional  arrays  has 
its  inriecies  permuted 

*11.  Scalar  Variable  Initialisation  Insertion 

For  each  s  in  S  the  initial  value  of  s  is  set  to  members  of 

K  . 

Operator  fiyalygtion  Wut a£ i on§ 

1.  Arithmetic  Ooerator  Replacement 

Each  instance  of  a  binary  operator  bo  is  ref  laced  by  members 
of  the  set  {♦,-,*,/,**>-(bo>.  Each  instance  of  unary  -  is 
eliminated. 

2.  Relational  Operator  R e c l a c e m e n t 

Each  instance  of  a  relational  operator  ro  is  replaced  by 
members  of  the  set  { . L T . , . L E . , . EG . , . N E . , . G T . , . GE . }- { r o) . 

3.  Logical  Connector  Replacement 

Each  instance  of  .  A  D  .  is  replaced  by  .PR.,  each  instance 
of  .OR.  is  replaced  by  .&NC.,  and  each  instance  of  .NOT. 
is  eliminated. 

*4.  Arithmetic  Precedence  Permutation 

Each  arithmetic  expression  containi nc>1  arithmetic  orerators 


is  rertaced  by  each  of  its  distinct  alternative  i arses. 

*5.  Logical  Frecederce  Permutation 

Facb  loqical  expression  containing>1  lonica l  connectors  as 
replaced  ty  each  of  its  distinct  alternative  r  arses. 

£ont rg l  Mytajignj 

The  followinn  sets  of  statement  label'  are  referenced 
in  oefir.inq  the  mutation  operef  ions  in  this  section: 

L  -  set  of  all  statement  labels  in  the  program. 

TRAP  -  used  to  rer  resent  a  statement  *  h  i  c  h  is 

ouaranteed  to  cause  a  pro cram  interrurt 

1.  r-otc-  Label  Feolaceeent 

Tach  instance  of  l  in  l.  in  ary  goto  statement  is  replacec  by 
members  of  L  -  {  l  }  . 

2 .  Path  Analysis 

Fact  simple  statement  (including  those  which  are  imiedded  in 
conditionals)  and  each  conditional  statement  is  replaced  by 
TRAP.  Fach  index  part  of  each  do  loop  statement  has  a  TRAP 
inserted  as  its  subsequent  statement.  This  checks  than  each 
control  rath  is  traversed  at  least  once  and  can  easily  be 
extended  to  see  if  each  path  is  traversed  any  number  of 


233 


J>.  Continue  Statenert  Insertion 
$ 

I  ach  do  loop  which  rloes  not  end  on  a  continue  staterent  is 
rade  tc  do  sc 

4.  Continue  Statement  Deletion 

fach  do  loor  which  ends  on  a  continue  statement  is  made  to 
end  on  the  preceedinc  statement. 

*5.  Inner  Do  loop  Decoupling 

Inner  do  loons  which  end  on  the  same  statement  as  their 
containing  do  loon  are  made  tr  end  on  a  serarate,  possibly 
duplicated,  statement. 

*(.  Do  Loop  In^ex  Alteration 

Althouqh  this  type  of  mutation  is  net  currer*ly  imclemerted 
as  a  separate  type,  * k  se  mutations  can  he  rroduced  as  a 
result  of  data  mutations  (see  above). 

7.  Return  Statement  Replace m ent 

Each  non-return  simple  statement  (including  those  which  are 
imbedded  in  conditionals)  and  each  conditional  statement  is 
replaced  by  a  return  statement.  Each  index  part  of  each  do 
loop  statement  has  a  return  statement  inserted  as  its 
suhseouent  statement. 


APPENDIX  E 


ENTERING  AND  MODIFYING  FILES 

Programs  are  normally  entered  into  the  computer  usi^o 
the  PRIMOSriR]  Text  Fditor  (ED).  This  editor  is  a  line 
oriented  text  processor  whose  line  pointer  is  always  located 
a  t  the  last  line  processed  (whether  the  processinc  is 
printino  locating,  moving  Pointer,  etc. I.  The  Fditor 
operates  in  one  of  two  modes,  T  *  P  l 1 T  mode  or  fDIT  mode. 

when  creatine  a  new  file,  the  Fditor  is  invoked  hy 

tyr i no 

or,  E r 

which  places  the  Editor  in  the  INPUT  mode.  k.her.  modityino 
an  existina  file,  the  Editor  is  invoked  ty  tyoinr 

OX,  Eh  filename 

which  places  the  Editor  in  the  EDIT  node.  The  "filename" 
specified  is  the  six-character  name  assioreo  to  the  raw 
proaram  file  being  created  or  modified.  At  any  tine,  the 
user  may  type  a  carriage  return  (c/r)  with  no  other 
characters  preceding  it.  This  is  known  as  a  "null 
resnonse."  This  null  response  will  switch  the  Editor  fror 
the  EDIT  mode  to  the  INPUT  mode  or  iron  INPUT  mode  to  EDIT 


235 


INPUT  Wgjj 

The  I\PUT  mode  is  used  when  er.  torim  text  iniorn  at  ion 

t 

into  a  file  (e.o.,  creatine  n  program).  The  wore  ]  ‘PIT  is 
dis clayed  at  the  user's  terminal  to  indicate  that  the  Tditor 
has  entered  the  INPUT  mi  ode.  The  c/r  key  will  terminate  the 
Current  line  of  text  and  prer  are  the  fditor  to  receive  a  new 
line.  Tabulation  is  accomplished  with  the  hackslarh  (\) 
character.  Each  backslash.  represents  t  h  e  first,  second, 
etc.  tab  setting;  the  tab  stors  are  at  columns  /,  15,  and 
TO.  The  use  of  c/r  with  nc  text  rrecedina  it  ruts  the 
Editor  in  EDIT  mode. 


mi  2sds 

The  FbiT  mode  is  used  when  the  conterts  nf  a  file  are 
to  he  modified.  ''ore  than  * 0  com. mends  are  available, 
although  we  will  only  describe  a  subset  of  the  available 
commands  *hat  should  Suffice  for  most  purposes.  The 
commands  are  described  later  in  this  appendix. 

In  the  EDIT  mode,  the  Editor  maintains  an  internal 
line  pointer  at  the  current  line  ( r  h  e  last  lire  processed). 
The  commands  TOP,  BOTTOM,  FIND,  and  LOCATE ,  move  this 
pointer.  The  VHEPE  comm.ar.d  displays  the  current  line 
number;  POINT  moves  the  pointer  to  a  specified  line  number. 
The  rODE  NU"PFP  command  causes  the  lire  number  to  be 
displayed  whenever  a  line  of  text  is  displayed.  All 


236 


commands  •for  locatior  and  modification  begin  urocessinc  with 
the  current  line.  The  use  of  c/r  with  no  text  prececino  it 
ruts  the  Frlihor  in  INPUT  mode. 

IXB29I2Bbl£ai  iZLSL  £Shre££ ion 

In  either  mode  the  user  may  correct  errors  in  tyrinc 
before  the  terminatino  (c/r)  is  typed.  He  last-  character 
entered  is  deleted,  roving  from'  right  to  left,  ore  character 
for  eact  hacks^ace(h-/s)  tyoed.  The  entire  current  lir.e  may 
he  deleted  by  tyninn  the  detete(del)  character.  The 

character  (fc/s)  is  obtained  by  hcldinp  tbe  V ey  marked  "CTftL" 
or  "CC'TPOL"  and  then  striking  the  kev  "I'."  Any  line 
followed  by  the  delete  character  is  null,  and  a  (c/r)  at 
that  point  wilt  switch  the  editor  into  the  alternate  mode. 

§2¥iog  lilts 

Crderly  termination  of  an  Editor  session  is  done  from 
the  f C I T  mode.  The  command: 

FILE  filename 

writes  the  current  version  of  the  edited  file  to  the  disk 
under  the  specified  filename.  The  file  will  te  created  if 
it  did  not  previously  exist  or  it  will  he  overwritten  if  it 
ooes  exist.  If  an  existing  file  is  teino  mndifieo,  the 


command : 


237 


FILE 

writes  the  new  version  to  the  disk  with  the  old  filename. 
After  the  extent  ior  cf  the  FILE  command,  the  fditcr  is 
terminated  and  control  returns  to  PRILLS  signified  by: 
"GK,"  on  the  user  terminal. 

Other  Useful  Tgchnigyjj 

The  followina  general  descrintions  will  aid  the  user 
in  adapting  to  the  PFI"*os  Editor. 

Any  number  Of  lines  may  be  moved  from  one  location  to 
another  using  the  D  U  N 1. 0 A 0  cormand.  D  L‘  N  L  0  A  D  aeleres  these 
lines  as  it  writes  them  into  an  auxiliary  file.  t  LOAD 
command  loads  the  new  auxiliary  file  data  at  the  desired 
point.  Any  number  of  lines  may  be  copied  from  one  location 
to  another  using  the  UNLOAD  command.  UNLOAD  works  the  samp 
as  DUMOAD  except  that  UNLOAD  does  not  delete  the  lines  as 
they  are  being  written. 

Any  line  the  begins  with  a  leoal  FORTRAN  statement 
number  may  be  located  with  the  FIND  ccmmand. 

The  MODIFY  command  is  used  when  a  line  must  be 
altered  but  the  relative  column  alior.ment  must  remain  the 
same. 


"5. 


i,\: 


£  Q 1 1  OR  ^gmmgnd  Symmgcy 


The  following  is  an  alphabetical  list  of  sorre  of  the 
available  Editor  commands.  for  a  detailed  description  of 
all  corrrands,  the  user  is  referred  to  the  Editor  Reference 
Section  of  THE  LEW  USER'S  GUIDE  TO  ETITCP  UN'  P  RU'.OF  F  [  1  . 
T  r  the  following  descriptions,  the  paraneter  "string"  is  any 
series  of  ASCII  characters  ircludinn  leading,  trailing,  or 
eml eddeo  blank  s  . 


A FJ P E N P  strino . Amends  strino  to  the  end  of 

the  current  lire. 

COTTON . . . h’cves  the  pointer  beyond  the 

last  line  of  the  file. 

CHANGE /st1/st2/Crl  C  G  ] . Replaces  s 1 1  with  st?  for  r 

lines.  If  G  is  cnittec,  cn'y 
the  first  occurrence  of  st 1  on 
each  line  is  charged;  if  (  is 
rresent,  all  occurrences  on 
lines  are  chanaed. 

DELETE  [n] . Deletes  n  lines,  including  the 

current  line.  The  default 


value  o  f  n  is  one 


DUNLOAO  filename  C  n  ] . ...Deletes  n  lines  from  the 

current  file  and  writes  Them 
into  filename.  the  Default 
value  of  r  is  ore. 

PILE  Cfilenane] . Writes  the  contents  ef  the 

current  file  into  filename  and 

-u j t s  to  rn»cs. 

F  It'D  string . Vnves  the  pointer  to  tie  first 

line  beginning  with  strinc. 

INSERT  strinc . Inserts  the  string  after  the 

current  lire. 

LOAD  filename . Leads  text  from  filename  mtc 

the  current  file  following  the 
current  line. 

LOCATE  string . o  v  e  s  the  rointer  forward  tc 

the  first  line  containing 
String.  The  strina  may 

contain  leading  and  trailina 
blanks. 

MODE  NU*H  E  * . Displays  line  numbers  in  front 

cf  di sp  layed  lines. 

MODE  NK'UMpER . Turns  off  the  line  number 


240 


d  i  sp  lay. 

NEXT  [  {+|-}  ]  In  3 . ...woves  the  pointer  n  lines, 

forwarc  it  r  is  [ositive  eng 
backward  it  n  is  n  e  o  a  t  i  v  e  . 

POINT  C  r  ] . moves  the  pointer  tn  line  n. 

PRINT  [  n] . Displays  the  current  line  or  r 

tines  heainrinn  with  the 
cur  rent  line. 

0U IT . Terminates  the  editing  session 

without  tiliro  the  currert 
tile. 

D  h  T  T  P  [  Strim . . . The  current  line  is  rfr  lacri  ty 

String. 

TOP . voves  tne  printer  one  line 

before  the  tirst  line  ot  text. 

UNLOAD  filename  CnD . .....Copies  n  lines  into  filename. 

w  H  E  p  r . Displays  the  current  lint 

numb  e  r  . 

ot hgr  taBifeiiiliss  Syisids  Its  IBII25 


from  time  to  time  the  user  will  rrotahly  wish  to  view 


241 


the  contents  of  a  file,  delete  an  existing  file  or  chanoe 
the  name  of  an  existinr  file.  These  capabilities  exist 
outside  of  the  Editor  facilities.  In  order  to  view  a  file 
a",  the  user's  terminal,  the  user  types 

f  r,  SLIST  filenanie 

where  filename  is  the  name  of  the  file  to  he  listed.  Lpon 
completion  of  the  listino,  control  is  returned  to  PKlhOS. 
files  may  be  deleted  with  the  PPI^OS  command 

OK,  DFLFTF  f  i  ler.arre 

where  filename  is  the  name  of  the  file  to  he  deleted.  » 
user  may  not  delete  a  file  that  he  does  not  own  or  that  has 
been  appropriately  protected. 

Files  may  be  renamed  .ith  the  PRimOs  command 

0  n  ,  CNftmp  oldname  new name 

where  oldname  is  the  current  name  of  the  file  and  n  e  w  n  a  m  e  is 
the  desired  new  file  name,  t  user  may  not  rename  a  file 
that  he  does  not  own  or  that  has  been  appropriate ly 


protected 


,\ji  in. 


242 


APPENDIX  F 

SAMPLE  PIMS  RUN 


The  following 

is  a  c  or  y 

of  thr  terrrina 

l  lialcc 

free 

an  initial  P I  *  S  run. 

Seme  c  f 

the  lines  were 

cEanne'"1  to 

f  i  t 

them  on  the  rage. 

but  tie 

inforrat ion 

r  resenteb 

i  s 

unc  hangeci . 


OK,  SEG  RUN>PIMS 
PRE-PUN  PHASE 

all  input  must  »r  if  upper  case 

ENTER  THE  RAW  PROGRAM  FILE  *AVE 
JRSTC'2 

CATFGOPIZE  FORMAL  PARAMETER  N 


PROG 

1 

S  U  0  R  0  Li  T  I  N  E  S0RTr?(f,A) 

2 

C  *  E  UP  P  L  F  SORT  -  ALLOW  EARLY  TERMInaTIO’. 

3 

INTEGER  N , A ( N ) 

4 

INTEGER  I 

5 

INTEGER  T 

A 

INTEGER  SORTED 

7 

C 

8 

IF  (N.Lr.1)  GOT''  3  CP 

9 

1  00 

CONTINUE 

10 

SORTED  -  1 

1 1 

DC  200  I  =  2,N 

12 

IF  (A(I-I).LE.A(I))  GOTO  200 

13 

T  =  A  <  I  -  1  ) 

1  4 

A  (  1-1  )  =  A ( I ) 

1  5 

A  (  I  )  =  T 

1  6 

SORTED  =  r 

17 

200 

C  0  N  T  I  N  U  E 

18 

IF  (SORTED. NE .1 )  GOTO  IOC 

19 

300 

CONTINUE 

2C 

RE  TURF 

21 

FND 

TYPE  NEXT  COMMAND 
INPUT 

CATFGOPIZF  FORMAL  PARAMETER  A 
10 

IS  MUTANT  CORRECTNESS  DEPENDENT  ON  A  PREDICATE  S  U  E:  R  0  U  T  1  NE  *> 
TYPE  A  YES  OR  NO  a*** 

NO 


r 


243 


NOW  KAN  Y  TEST  CASES  A  R  F  TO  PE  SREClflFD'» 

? 

SPECIFY  TEST  CASE  1 

INTER  VALUES  FOR 
* 

5 

ENTER  5  VALUES  FOR  ARRAY  A 

1  2  3  4  5 

TEST  CASE  NURPER  1 
PARAMETERS  ON  INPUT 
N  =  f 

PARAMETERS  ON  OUTPUT 
A  (  1)  =  1 

A  (  ?)  = 

A  (  ?) =  3 

A  (  4 )  =  A 

A  <  5  )  =  5 

THE  PAW  PROGRAM  TOOT  1 f-  STEPS  TO  EXECUTE  THIS  TEST  CASE 
HIT  RETURN  TO  CONTINUE 


PLEASE  VEPIFY  THAT  DATA  IS  (ORRECT 
TYPE  A  YES  CR  NO  **** 

YES 

SPECIFY  TEST  CASE  2 

ENTER  VALUFS  FOR 

N 

5 

ENTER  5  VALUES  FOR  ARPAY  A 

99  -99  -55  P  50 

TEST  CASE  NUMBER  2 
PARAMETERS  ON  INPUT 


N 

* 

5 

A 

( 

1  )  = 

9  9 

A 

( 

2)  = 

-99 

A 

( 

*)  = 

-5  5 

A 

( 

4  )  = 

r 

A 

t 

5 )  = 

5P 

PARAMETERS 

ON  OUTPUT 

*- 

A 

( 

1  )  = 

-9  9 

. _ 

A 

< 

2)  = 

-55 

A 

( 

3 )  = 

n 

A 

( 

4  )  = 

5  0 

A 

( 

5)  = 

0  9 

The  RAW  PROGRAM  TOOK  51  STEPS  TO  EXECUTE  THIS  TEST  CASE 
HIT  RETURN  TO  CONTINUE 

PLEASE  VERIFY  THAT  DATA  IS  rORRECT 

type  a  yes  or  no  **** 

YF  S 


WHAT  new  TYPFS  OF  MUTANTS  arc  to  E-E  CONSIDERED  ? 
ALL 


MUTATION  PHASr 
POST  RUN  PHASE 
NUMPFR  OF  TfST  CASES  =  2 

NUMBER  Of  L 1 V  t  MUTANTS  =  SI 
NUMBER  OF  MUTANTS  =  ?  4 1 

PERCENTAGE  Of  RUMINATED  MUTANTS  =  f}.l> 


MUTANT  TYPES  A  A1 0  LIVE  MUTANTS  PRCFILFC 


TYPE 

MUTANTS 

LIVE  * 

TYPE 

MUTANTS  L I V  F 

«  T  YPf 

A  UT  A  N  T  S 

L  I 

Vf  * 

ALD 

1 

0* 

CRT 

1  A 

4  * 

SVR 

4? 

?  . 

sf  r 

?  2 

f  « 

CFS 

30 

2  * 

Cf  A 

1? 

C* 

S  F  A 

24 

0* 

A  F  C 

F 

^  * 

AES 

1  2 

0* 

A  0  P 

12 

(i* 

»0R 

i  < 

5* 

GL  R 

4 

y  ♦ 

PAN 

1  (5 

1  * 

CSD 

1 

r  * 

WSR 

1  5 

4  * 

MUTANT  ELIMINATION  "ETHOD  PROFILE 

METHOD  COUNT*  MfTHOP  COUNT*  m  E  T  H  C  D  COUNT* 

TIMED-OUT  34*  P£F  U*'DVAR  47*  SUFSCF  R*r  ?5* 

ARTH  FAULT  0*  RDCNLY  V/F  C*  T»AP  ST*‘T  15* 

EQUIV  0*  7  F  R  0  DIV  P*  WRONG  ANS  75* 

POST  RUN  RESULTS 
HFLP 

COMMANDS  CAN  USUALLY  0  E  A  fH  R  I V  I  A  T  f  r  1  r> 

TWO  LETTEPS,  C  0  N  M  A  ‘ '  D  r-  ARF  AS  FOLLOWS  : 

HELP  -  DISPLAY  Tt-’I'.  HELP  PACE  (CANNOT  APBRIV.) 

rILL  -  ADOPT  TMf  ClPRf’T  =>UN  (CANNOT  A«>PRIV.) 

PROGRAM  -  TYPE  THE  PROGRAM  PEING  AUTATFD 
TESTCASE  N-  TYPT  ThF  TFSTCASE  N 

MUTANTS  -  TYPE  ML  Tpr  LIVE  **UTA*TS 

MUTANTS  (KEYWORD)  O'FYvr.tr)  _  (KEYWCFD) 

-  T  Y  P  F  ONLY  PUTAN'TS  CF  THE  SPFCIFIFD  TYU 
MUTANTS  SFLFCT  -  SELECT  T«E  MUTANTS  M  E  a  L:  STYLE 
MUTANTS  KEYWORDS  -  SEE  THF  KEYWORDS  F(P  MUTANTS  TYPES 
HFADEP  -  DISPLAY  The  PIM?  R'JN'  HFATEP 
CORRECTNESS  -  DISPLAY  The  METHOD  Or  0  f  T  I  P N’  1  f:  I  N  C 
CORRECTNESS 

RESULTS  -  DISPLAY  THE  RESULTS  FOR  *  U  T  ANTS  C F E A T F n 
IN  THIS  RU* 

STATUS  -  DISPLAY  THE  STATUS  OF  ALL  MUTANTS  (  I N  C  L  L' o  I  NT- 
PREVIOUS  PUNS) 

HALT  -  STOP  THE  CUPPENT  PIMS  »LN 

LOOP  -  ITFRATE  THE  CURRENT  RUN 
OUTPUT  TESTCASFS  -  JUST  THAT, 

OUTPUT  MUTANTS  -  OUTPUT  ALL  LIVE  ’M'TM'TS 
OUTPUT  MUTANTS  (KEYWORD)  (KEYWORD)  ...  (KFYWORD) 

OUTPUT  ONLY  MUTANTS  OF  THF  INDICATED  TYPF 
OUTPUT  MUTANTS  RANDOM  -  OUTPUT  ONE  FA’ DO*  »IITAM  Of 
EACH  TYPE 

OUTPUT  MUTANTS  RANDOM  (KEYWORD)  (KEYWORD)  (KEYWORD) 
POST  RUN  RESULTS 
halt 


BIBLIOGRAPHY 


Cl]  C  .  V  .  Ran  arr  oor  thy ,  S.F.Ho,  and  w.T.Chen,  "On  the  Autf>*atcd 
Generation  of  Program  Test  Data",  1  E  F  F  Transactions  on 
Software  Engineering  $C~?,4  f  D  e  c  1  y76),  rr  ?93-3fL'. 

C2]  W.F.Howden,  "wethodolooy  for  the  Generation  of  frograr 
Test  Data",  IEEE  T  ranssct ions  on  Computers  C-?4,c  f^ay 
1975),  rr  554-56C. 

C3D  w.F.Howden,  "Reliability  of  the  Path  Analysis  Testiro 
Strategy",  IEEE  Transactions  on  Software  Ir  cineerirr  SF-2,3 
CSept  T  9  76 ) ,  rr  20°-214. 


C4]  J  .  F  .  Goodenough  and  S.L. Gerhart,  "Towards  ?  Theory  of 
Test  Data  Selection",  IEEE  Transactions  on  Software 
Enoineerino  SE-1,2  (June  1975),  pp  156-173. 

C5]  J.C. Huang,  "An  Ar  preach  to  Program  Testino",  Computing 
Surveys  7,3  (Sent  1975),  pp  113-12?. 


C6]  E.F. Miller  and  R. A. Melton,  "Automated  feneration  of 
Testcase  Datasets",  in  Procedings  of  the  First  International 
Conference  on  Reliable  Software,  S  J  GPL  A  h  Notices  1  T , /  (June 
1975),  on  51-58. 

C7]  L. Clarke,  "A  Svste"  to  Generate  Test  Data  and 
Symbolically  Execute  Proorams",  IEEE  Transactions  or 
Software  Engineering  SF-2,3  (Sept  1976),  pr  215-222. 

C8]  J.P.  ina,  "Symbolic  Executior  and.  Erooran  Testing", 
Communications  of  the  ACr  19,?  (July  1976),  cr  ‘Ft5-3°4. 


C9]  R. London,  "The  Current  State  of  Frovino  troorams 
Correct",  in  Procedinos  o<  the  ACM  National  Conference, 
1972,  A  C v ,  New  York,  pp  39-46. 


C1PD  S.Hantter 
Correctness  of 
pr  331-353. 


and  J.tcinp,  "An  Introouction  to  Proving  the 
Programs",  Computing  Surveys  8,’  (Sert  1576), 


111]  F.A.Younos,  "Human  Frrors  in 
International  Journal  of  Man  Machine  Studies 
361-376. 


Programming", 
(  (1974),  pp 


C12]  P. Boehm,  "Software  Design  and  S t r uc tu r i nc" ,  in 
Practical  Strategies  for  Developino  Laroe  Software  Systems, 
Horowitz  (Editor),  A dd i s on-Vc s  l ey,  1975,  pp  103-128. 

C13]  F.DeMillo,  R.Lipton,and  E.Say yard,  "Hints  on  Test  Data 


246 


Selection",  IEEE  Computer,  vol.  11,  no.  4,  Acril  1S7fe,  rt 
74*4  1  . 

[14]  Special  Issue:  P r oa r a m m i no ,  ACy  C  omput  i  no  Surveys  6,4 

(Dec.  1  74  )  ,  no  ?U°-31V 

[  1 S 1  R.r'“Millo,  R.  Litton,  F.Say«ard,  "Ftcrur  v  U  T  A  T  I  Oh  :  a 

Vfthod  of  t'eteri  inirtc  Test  Data  Abeouacy",  State  of  tfe  Art: 
Proa  ram  Testing,  SPA/I  f'FPTfCu,  1Q7S. 

[16]  R.  Oefillo,  f.  Say  warp,  "Prc.prar  “utaticn  ?s  a  Toot 
for  a  a  n  a  a i n  a  Large  Software  Develop rent,"  Trans.  16th 
t,etina  for  Ouality  Control. 

[17]  SOFTWARE  TOOLS  SU«SYSTE'  USER'S  Gl’ICf,  (  G  I  T  -  I  C  S  -  7F  /  P  2  )  , 
Georgia  Institute  of  Technology,  Ser- tenter,  1R7E. 

[1®]  FORTRAN'  PROGRAMMER'S  GUIDE!,  PDR?CS7,  prime  C  omf  Li  t  e  r  , 
Incorporated,  1  raeinnhae,  "a  r  s  a  C  h  us  e  1 1  s  ,  N'overrKer,  1977. 

[10]  THE  NEW  USER'S  GUIDE  TO  EDITOR  n[)  RUNOFF,  PD.R71C4, 
Prime  Computer,  Incorporated,  F r am i nc har ,  Massachusetts, 
November,  1977. 


